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ls Microsoft Leaving Small 
Businesses Behind? 


hese days, the name Microsoft is virtually 

synonymous with enterprise. However, the enter- 
prise isn’t where Microsoft came from: Microsoft got 
its start with MS-DOS and later Windows on personal 
computers. Consumers liked using Windows at home, 
and Windows worked its way into businesses from 
the bottom up. Back then Windows was designed for 
consumer ease of use. 

Few thought Microsoft could be a contender 
when it entered the server market with Windows 
Server NT sporting the easy-to-use Windows front 
end. SQL Server 6.0 and SQL Server 6.5 were still 
considered small business or departmental data- 
base platforms that couldn’t compete with the likes 
of Oracle and DB2. However, businesses looking 
for simpler and less costly solutions began choosing 
Microsoft products (e.g., Word over WordPerfect, 
Excel over Lotus 123, and Windows NT over Novell 
NetWare), giving Microsoft a foothold in the enterprise. 

At that time, IBM was the dominant force in 
business. IBM solutions were all targeted at enterprise 
customers. IBM took all of its feedback and product 
direction from its biggest and most important 
customers, and IBM products became more sophisti- 
cated, more complex, and more expensive. Now IBM 
products are so user unfriendly and complex that most 
of them serve as gateways for IBM to sell consulting 
services. 

Nowadays, Microsoft is the dominant player in 
the enterprise market. Perhaps inevitably, as Micro- 
soft grew into an enterprise player, the complexity 
of its products grew as well. Instead of looking to 
sell first to consumers, Microsoft now looks to sell 
to businesses. The company now focuses less on the 
ease of use of its products than on enterprise scal- 
ability and security. Even though Microsoft might 
still get some feedback from consumers, it has 
been focusing on getting product feedback from 
Technology Adoption Program (TAP) and Rapid 
Deployment Program (RDP) customers. These are 
predominantly enterprise customers. Although the 
names have changed, Microsoft’s relationships with 
enterprise customers closely parallels the relation- 
ships IBM had with customers in the past. 

Don’t get me wrong. The enterprise focus isn’t 
in itself a bad thing, and, in fact, the enterprise 
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evolution seemed inevitable so that Microsoft could 
grow its products and its market base. Enterprise 
customers tend to be sophisticated and have com- 
plex needs; their feedback leads to sophisticated and 
complex products. However, it’s a mistake to think 
that the solutions designed for the enterprise will 
work for small businesses. An enterprise will readily 
accept complexity if it solves an important pain point 
or addresses a key business issue. But features that 
solve an enterprise’s problems often add needless 
complexity for a smaller organization. This is why a 
product for ISVs like SQL Server Express (a favorite 
of mine) is having a tough time competing with the 
much smaller, simpler, open-source MySQL. Even 
Microsoft products that supposedly target the small 
business, such as Windows Small Business Server, 
aren’t really simple. Microsoft added wizards to try to 
mask the product’s complexity, but at the end of the 
day, it’s really designed for a sophisticated consultant 
who manages a small business’s infrastructure. 
Today, many Microsoft products stand at the 
brink of the maximum complexity and price that 
many small businesses don’t have the bandwidth 
to handle. Let’s see what’s happening with price: 
Microsoft has dropped the more affordable Visual 
Studio 2008 Standard Edition and replaced its spot 
in the lineup with the new high-end Visual Studio 
2010 Ultimate Edition. A license for SQL Server 
2008 R2 Standard Edition will be about $1,500 more 
expensive than one for SQL Server 2008 Standard 
Edition. And a license for SQL Server 2008 R2 
Enterprise Edition will be about $3,800 more ex- 
pensive than one for SQL Server 2008 Enterprise. A 
large business could easily swallow these costs, but a 
small business would have a much tougher time. 
Although catering to the enterprise brings Micro- 
soft profit today, you have to wonder what effect this 
approach will have on its small business customers. 
Will those customers opt for cheaper open-source or 
new cloud computing options as a way out of deal- 
ing with the ever-escalating price and complexity 
of Microsoft’s product lines? What do you think? 
Is Microsoft leaving small businesses behind? Tell 
me how these changes have affected you at motey@ 
sqlmag.com or letters@sqlmag.com. SQL 
InstantDoc ID 103615 


Michael Otey 


(motey @ sqlmag.com) is technical director 
for Windows IT Pro and SQL Server 
Magazine and author of Microsoft SQL Server 
2008 New Features (Osborne/McGraw-Hill). 
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Update the Statistics on 
Recently Modified Tables Only 


SQL Server 2008 and SQL Server 2005 produc- 

tion database servers. The data in our production 
environment constantly changes. Updating the statis- 
tics on the tables where a data modification took place 
is important because it helps the queries that rely on 
those tables run faster. 

Our databases range in size from 50GB to more 
than 1TB, so running the UPDATE STATISTICS 
statement on an entire database is time-consuming. 
Although some data changes involve many tables 
(and hence millions or even billions of rows) in a given 
database, other data changes involve only a handful 
of tables. Why run UPDATE STATISTICS on all 
the tables when only a handful of tables had data 
modifications? 

To save time and resources, I wrote a stored proce- 
dure, sp_DBARunUpdateStats, that updates statistics 
only on those tables where the data has been recently 
changed by INSERT, DELETE, or UPDATE state- 
ments. (I also wrote a related stored procedure, 
sp_DBAGetUpdateStats, that reports on the last 
time the UPDATE STATISTICS statement was run 
against all the table indexes in a database. You can 
read about this stored procedure in the sidebar 
“Stored Procedure Provides UPDATE STATISTICS 
Report.”) After I show you how to run sp_DBARun- 
UpdateStats, I'll explain how it works. 


i work with multiple customer databases in several 


How to Run 

sp_DBARunUpdateStats 

The sp_DBARunUpdateStats stored procedure is 

designed to reside in the master database. That way, 

you can invoke this stored procedure from Query 

Analyzer regardless of the database the current session 

is using. You can also invoke sp_DBARunUpdateStats 

from a SQL Server Agent job. (If desired, you can store 

sp_DBARunUpdateStats in another database as long 

as the call to the stored procedure is fully qualified.) 
The stored procedure takes four parameters: 

e @DBName. You use this mandatory first para- 
meter to specify the name of the database that 
contains the tables you want to update. 

e @ModifiedTableOnly. You use this optional 
second parameter to indicate the scope of the 
update. You specify 'Y' when you want to update 
the statistics on only those user tables where the 
data was recently modified (i.e., modified since the 
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last time UPDATE STATISTICS was run). You 
specify 'N' when you want to update the statistics 
on all the user tables. The default value is 'N'. 

e @RunSPRecompile. You use this optional third 
parameter to indicate whether you want to run the 
sp_recompile system stored procedure on the updated 
tables ("Y') or not (N'). The default value is "Y". 

e @DisplayOutput. You use this optional fourth 
parameter to indicate whether you want the output 
displayed (‘Y’) or not (‘N'). The default value is "Y". 


Saravanan 
Radhakrishnan 


If you don’t specify an optional parameter, its 
default value is used. So, for example, if you want to 


To save time and resources, | wrote a 
stored procedure that updates statistics 
only on those tables where the data has 
been changed recently. 


update the statistics on all the tables in the Adventure- 
Works database, run sp_recompile on those updated 
tables, and display the output, you'd use the command 


ORE on the WEB 


Download the code at InstantDoc 
IDs 103455 and 103456. 


EXEC sp_DBARunUpdateStats 
@DBName = 'AdventureWorks' 


If you want to update the statistics on only those Adven- 
tureWorks tables that had their data recently modified, 
run sp_recompile on those updated tables, and display 


the output, you’d use the command 
a Fditor’s Tip 


Share your SQL Server code, 
comments, discoveries, and 
solutions to problems. Email 
your contributions to r2r@ 
sqlmag.com. Please include 
your full name and phone 
number. We edit submis- 
sions for style, grammar, 
and length. If we print your 
submission, you'll get $100. 
—Karen Bemowski, 
senior editor 


EXEC sp_DBARunUpdateStats 
@DBName = 'Adventureworks', 
@ModifiedTableOnly = 'Y' 


If you want to update the statistics on only those 
AdventureWorks tables that had their data recently 
modified, but not run sp_recompile or display the 
output, you'd use the command 


EXEC sp_DBARunUpdateStats 
@DBName = 'Adventureworks', 
@ModifiedTableOnly = 'Y', 
@RunSPRecompile = 'N', 
@DisplayOutput = 'N' 
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How sp_DBARunUpdateStats 
Works 

Here’s the overall process that sp_DBARunUpdateStats 
uses to update statistics: 

1. The stored procedure creates two temporary 
tables. 

2. The stored procedure populates the first 
temporary table with the names of all the tables 
and their indexes in the specified database. 

3. The stored procedure checks to see whether you 
specified that you wanted to update the statistics on only 
those tables where the data was recently modified. If you 
did, it populates the second temporary table with the 
names of the tables where the data has changed recently, 
then deletes any entries from the first temporary table 
that don’t have a match in the second temporary table. 

4. The stored procedure runs the UPDATE 
STATISTICS statement against each table listed in 
the first temporary table. 


To achieve step 3, sp DBARunUpdateStats takes 
advantage of two features that have been available 
since SQL Server 2005: the STATS_DATE function 
and the sys.dm_db_index_usage_stats dynamic 
management view (DMV). 


The STATS_DATE Function 

To determine the last time UPDATE STATISTICS 
was run on a table, the stored procedure uses the 
STATS_DATE function, which returns the date and 
time of the most recent statistics update for a table. 
This function’s syntax is 


STATS_DATE (object_id, stats_id) 


where object_id is the ID of the target table and 
stats_id is the ID of the statistics object. You can 
obtain the statistics object’s ID from the index_id 
column in SQL Server’s sys.indexes catalog view for 
the specified table. 

In sys.indexes, a table will have a row for each 
of its indexes and a row representing the table 
itself (assuming it’s stored in a heap). The index_id 
column will contain a value of 0 (index_id = 0) when 
it represents the table. The index_id column will 
contain a value greater than 0 (index_id > 0) when 
it represents an index. So, for example, a table with 
three indexes will have four rows: one row that reads 
index_id = 0 and three rows that read index_id > 0. 

For each index_id > 0, you can have the STATS_ 
DATE function get the date and time when the last 


STORED PROCEDURE PROVIDES UPDATE STATISTICS REPORT 


Knowing the last time the UPDATE STATISTICS statement was run against all the table indexes in a database can be useful information in and of itself, 
so | wrote sp_DBAGetUpdateStats to report this information. After this stored procedure retrieves the date and time of the last UPDATE STATISTICS 


run for each table index, it sorts the retrieved datetime values in reverse chronological order. Figure A shows sample results. 


The sp_DBAGetUpdateStats stored 
procedure is designed to reside in the 
master database, but you can store it 
in another database as long the call to 
the stored procedure is fully qualified. 
If you store sp_DBAGetUpdateStats in 
the master database, though, you can 
invoke this stored procedure from the 
Query Analyzer regardless of the data- 
base the current session is using. 

The sp_DBAGetUpdateStats stored 
procedure takes only one mandatory 
parameter: the name of the target data- 
base. For example, if you want to list 
the last time the UPDATE STATISTICS 
statement was run against all the table 
indexes in the AdventureWorks data- 
base, you’d use the command 


EXEC sp_DBAGetUpdateStats 
"Adventureworks' 


You can download sp_DBAGetUp- 
dateStats by going to www.sqimag.com, 
entering 103455 in the InstantDoc ID 
text box, clicking Go, and clicking the 
103455.zip hotlink. This stored proce- 
dure works on SQL Server 2008 and 
SQL Server 2005. 

InstantDoc ID 103455 
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SQLQuery1.sql ...krishnan (72))* | Object Explorer Details 


| EXEC sp_dbagetupdatestats 'msdb' 


ickupset 
dbo.DTA_reports_column 
dbo.DTA_reports_index 
dbo.DTA_reports_table 
dbo.DTA_reports_table 
dbo. backupmediatamily 
dbo.backupmediaset 
dbo. backupfilegroup 
dbo. backuptile 
dbo. backupset 
dbo.backupset 
dbo.MSdbms_datatype_mapping 
dbo.MSdbms_map 
dbo.DTA_reports_column 
dbo.DTA_reports_indexcolumn 


Index Name 


K__backupme__DACB59E4D04AFB25B 


~ PK_backupse_21F794A4B0E391095 


DTA_reports_column_index 
DTA_reports_indexindex 
PK__DTA_repo__7D5F018E12E8C319 
DTA_reports_table_index 
PK__backupme__0013C86F0880433F 
backupmediasetuuid 
PK__backupfi__760CD67412FDD1B2 
PK__backupfi__57D1800417C286CF 
backupsetuuid 

backupsetDate 
pk_MSdbms_datatype_mapping 
pk_MSdbms_map 
PK__DTA_repo_1441422F316D4439 
DTA_reports_indexcolumn_index 


Latest Statistics Run Date 
2010-01-13 16:18:17.483 
2010-01-13 16:17:59.180 
2010-01-07 19:10:09.160 
2010-01-07 18:15:30.023 
2010-01-07 18:15:29.410 
2010-01-07 18:15:29.073 
2010-01-05 00:06:25.923 
2010-01-05 00:06:25.860 
2009-12-31 14:36:27.340 
2009-12-31 14:36:27.280 
2009-12-18 09:29:32.790 
2009-12-14 00:00:00.617 
2009-12-01 01:17:47.917 
2009-12-01 01:17:47.690 
2009-08-28 09:03:19.353 
2009-08-28 09:03:19.283 
2009-08-28 09:03:19.227 


dbo.DTA_reports_index 


PK_DTA_tepo_40BC844124134F1B 


Fully. 


Sample results from sp_DBAGetUpdateStats 
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UPDATE STATISTICS was run for that particular 
index. Therefore, if you run a statement such as 


"SELECT STATS_DATE 
Cobject_id('dbo.MyTable'), 1)' 


the returned datetime value (e.g., 2009-08-16 
16:46:29.197) represents the last time UPDATE 
STATISTICS was run on that table. 


The sys.dm_db _ index_usage_ stats 
DMV 

The sys.dm_db_index_usage_stats DMV contains 
index usage information for user tables. To access 
this information for all tables and their indexes in a 
database, you’d use a query such as 


SELECT * FROM 
sys.dm_db_index_usage_stats 
WHERE database_id = 
db_id('MyDatabaseName') 


The result set would include counts of different types 
of index operations and the time each type of opera- 
tion was last performed. In other words, the result 
set would indicate how many times a particular 
index was used by queries. It would also indicate 
the last time an index was modified by an INSERT, 
DELETE, or UPDATE statement. (Note that the 
DMV statistics might not be complete because they 
can be lost when SQL Server stops or when other 
events occur, such as the dropping and re-creation 
of an object.) 

The DMV’s last_user_update column specifies 
when (date and time) the last data modifi- 
cation took place on an index. So, you can narrow 
the query by specifying that column with code 
such as 


SELECT last_user_update FROM 
sys.dm_db_index_usage_stats 
WHERE database_id = 
db_id('MyDatabaseName') 


If you want to check this information for a specific 
table, you can further refine the query by adding the 
table’s ID, as in 


SELECT last_user_update FROM 
sys.dm_db_index_usage_stats 
WHERE database_id = 
db_id('MyDatabaseName') 

AND object_id = 
object_id('dbo.MyTable') 


The returned datetime value represents the last time 
that table’s index was modified. 
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How sp_DBARunUpdateStats 
Uses the Features 

Here’ how sp_DBARunUpdateStats uses the 
STATS_DATE function and the sys.dm_db_index_ 
usage_stats DMV. For every table index, sp_ 
DBARunUpdateStats first uses the STATS_DATE 
function to find out the last time the UPDATE 
STATISTICS statement was run. Next, the stored 
procedure accesses the DMV’s last_user_update 
column to find out the last time that index was 
modified. 

The stored procedure then compares the two 
retrieved datetime values. If the datetime value 
returned by the STATS_DATE function is less than 
the datetime value found in the last_user_update 
column, the index needs to have its statistics 
updated. If that isn’t the case, the statistics don’t 
need to be updated for that index because no 
data has changed since the last time UPDATE 
STATISTICS was run. 


On tables with 100 million or more rows, 


sp_DBARunUpdateStats samples only 10 


million rows to see whether UPDATE 


STATISTICS has been run recently. 


Sampling isn’t used on tables with less than 


100 million rows. 


This process not only determines whether 
UPDATE STATISTICS needs to be run but also 
has an added bonus: The stored procedure qualifies 
those tables and their indexes to which the data is 
loaded for the very first time. 


Give It a Try 
If you'd like to try sp_DBARunUpdateStats, you 
can download it from the SQL Server Magazine 
website. Go to www.sqlmag.com, enter 103456 in 
the InstantDoc ID text box, click Go, then click the 
Download the Code Here hotlink. This stored pro- 
cedure works on SQL Server 2008 and SQL Server 
2005. On tables that have 100 million or more rows, 
the stored procedure will sample only 10 million rows 
to see whether UPDATE STATISTICS has been run 
recently. Tables that have less than 100 million rows, 
the stored procedure runs update statistics without 
using any sample option. SQL 
—Saravanan Radhakrishnan, 
senior production DBA, 
Revionics 
InstantDoc ID 103456 
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What You Back Up Is What 


You Restore 


’m trying to come up with a database maintenance 

| plan that runs as fast as possible. I’ve been told that 

restoring a full backup will reorganize the indexes 

so that I don’t have to worry about index maintenance. 

Can you tell me what other operations are performed 
when a backup is restored? 


There are quite a few misconceptions around backup 
and restore, and this is one of them. Simply put, you 
can think of backup and restore operations as “what 
you back up is what you get when you restore.” There- 
fore, no optimizations are 
performed when the restore 
writes the data file pages into 
the database’s data files. 
The backup operation 
(whether full, differential, 
or a log backup after a 
minimally-logged opera- 
tion) reads the data file 
pages from disk and doesn’t 
even look at them, except 
to check any existing page checksums if you used 
the WITH CHECKSUM option in SQL Server 2005 
and later. When the backup is restored, the data file 
pages are written back into the database’s data file 
at exactly the same page offset and location as when 
they were read by the backup. The only changes that 
are potentially made to the data file pages are those 
required to perform recovery on the database to bring 


File fragmentation can 
have a small effect on 
scan performance but 
magnitudes less than 
index fragmentation. 


it to a transactionally consistent state when the restore 
sequence is complete. 

It’s possible that the restored copy of the database 
might have less NTFS-level file fragmentation if it’s 
restored on a volume (or volumes, for multiple data- 
bases’ data files) with enough contiguous free space 
to accommodate the new file(s). File fragmentation 
can have a small effect on scan performance but 
magnitudes less than index fragmentation within 
the database’s data files. (You can learn more about 
how backup and restore operations work by viewing 
the Backup/Restore cat- 
egory of my blog at www 
-sqlskills.com/BLOGS/ 
PAUL/category/Backup 
Restore.aspx.) 

A related question I 
often get asked is whether 
it’s possible to avoid 
performing consistency 
checks when using the 
WITH CHECKSUM 
option on backups of databases created on SQL 
Server 2005 or later where page checksums are 
enabled by default. Again, the answer is no. The 
reason is that it’s possible that a page was corrupted 
in memory and then written out to disk with a valid 
checksum. Although the page appears to be valid, 
it is, in fact, corrupt, which only DBCC consistency 
checks can detect. 


Comparing Execution Plans 


Display Estimated Execution Plan option enabled 

instead of the Include Actual Execution Plan 
option enabled, the plans look the same. I don’t fully 
understand the difference between these two plans and 
when I should use one over the other. In addition, when 
I compare the two plans, I see that their relative cost 
in the batch is 50 percent (each), but one query takes 
significantly longer to run. Why is that? 


i noticed that when I run a query with the 


These are excellent questions because both issues 
create a lot of confusion. Although the estimated 
plan and the actual plan will have the same general 
plan details (e.g., index usage, join types), SQL Server 
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doesn’t have to execute the query to generate the 
estimated plan. The estimated plan lets you see the 
plan that would have been chosen without having to 
wait the time it takes for the query to execute. Because 
the two plans will look the same, you might think it’s 
better to always use the estimated plan. Although it’s 
helpful to see the plan that SQL Server would have 
chosen, the estimated plan doesn’t tell you the effects 
(or efficiency) of this plan during runtime. 

When you hover over something (e.g., a table) 
within the showplan output, you'll get more infor- 
mation about the operation via the tooltip box. The 
actual execution plan lets you see all of the runtime 
characteristics. Figure 1 shows the estimated plan, 
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3 Execution plan 


(relative to the batch): 
SELECT m.* FROM dbo.member AS m WHERE m. lastname = 
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Logical Operation Index Seek 
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Estimated Number of Executions 1 
Estimated Operator Cost 0,0032831 (50%) 
Estimated Subtree Cost 0,0032831 
Estimated Number of Rows 1 
Estimated Row Size 20 6 
Ordered True 
Node ID 1 
Object 


[credit].[dbo].[member].[LastNamelnd] [m] 


Output List 
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Seek Keys[1]: Prefix: [credit].[dbo].[member].lastname = 


Scalar Operator(‘Tripp') 
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Query 1: 
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SELECT [m] .* FROM [dbo].[member] [m] WHERE [rm] .[lastname]=@1 
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Scan a particular range of rows from a nonclustered 
index, 


[member]. [member_ident] 


Cost: 50 $ 
Physical Operation Index Seek 
Logical Operation Index Seek 
Actual Number of Rows 0 
Estimated I/O Cost 0.003125 
Estimated CPU Cost 0.0001581 
Estimated Number of Executions 1 
Number of Executions 1 
Estimated Operator Cost 0.0032831 (50%) 
Estimated Subtree Cost 0,0032831 
Estimated Number of Rows 1 
Estimated Row Size 208 
Actual Rebinds 0 
Actual Rewinds 0 
Ordered True 
Node ID 1 
Object 
[credit].[dbo],[member].[LastNamelInd] [m] 
Output List 


[credit].[dbo].[member].member_no, [credit].[dbo]. 


[member].lastname 
Seek Predicates 


Seek Keys[1]: Prefix: [credit].[dbo].[member].lastname = 


Scalar Operator(‘Tripp') 


Figure 2 


Actual execution plan 
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and Figure 2 shows the actual plan 
used during execution. Notice that 
the query plan (Index Seek to a Key 
Lookup in a clustered index) is the 
same in both the estimated plan 
and the actual plan. However, the 
tooltip box in the actual plan has a 
few more rows, which contain the 
actual runtime values. Although 
gathering the runtime values takes 
more time and will make your que- 
ries slower (they’re executed with 
statistics being gathered during the 
execution, which does impact query 
performance), it’s the most effective 
way to troubleshoot your query if 
you experience poor performance. 

The most important of the run- 
time characteristics are the statistics 
on which the plan was based. You 
don’t directly see the statistics; how- 
ever, you can see the factors that led 
up to the plan choice and begin to 
infer whether a plan was optimal, 
and why. 

In the actual execution plan, the 
most important information is in the 
tooltip box. When query plans are 
generated, there are multiple phases 
that SQL Server goes through. During 
the optimization phase, SQL Server 
estimates the number of rows that 
will be processed by each operation. 
This estimate helps you determine the 
costs of various execution plans and 
ultimately the choice of execution 
plan. If the estimate is incorrect and 
off from the actual, especially by a 
large amount (e.g., the actual is more 
than two or three times the estimate), 
that implies SQL Server didn’t have 
an accurate picture of how much data 
was going to be affected. 

When that’s the case, two things 
are problematic. First, the plan might 
not have been ideal and you might 
have suffered poor performance as a 
result. Second, the showplan results 
for percentages within the statement’s 
showplan, as well as between the 
batches (if multiple statements are 
executed), aren’t likely to be correct, 
and, in fact, they might be grossly 
inaccurate. This can be especially true 
with stored procedures—and answers 
the second part of your question. If 
two statements (or procedures) are 
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executed and they have the same plan, then they'll each 
have a cost of 50 percent of the overall plan cost. How- 
ever, if one query or procedure takes more time than 
the other, double-checking its runtime characteristics 
might provide the answer. For example, in Figure 3, 
you can see that the same SELECT statement was 
executed twice. The result shows 50/50 as the costs for 
each execution. And although the plan is the same, and 
therefore the costs would be equal, the second plan 
took more time. With a closer look, you can see that the 
second execution’s tooltip value for Actual Number of 
Rows is 4,999, whereas the estimate was only 1, which 
was shown in Figure 2. This value is the cause of poor 
performance and the inaccuracy in the costs shown by 
the showplan. 

I like the speed with which I can see and review 
an estimated plan via the Display Estimated Execu- 
tion Plan option in SQL Server Management Studio 
(SSMS). However, I prefer the actual showplan 
displayed via the Include Actual Execution Plan 
option when I need more detail or when trouble- 
shooting performance problems. Also, I always 
double-check the runtime characteristics to make 
sure they’re relatively accurate during analysis and 
before I can trust a showplan (and its batch results). 
If they aren’t accurate, then I know that I can’t trust 
the comparisons between the batches. 


What’s Reading 
Log? 


have a fairly heavy update workload in my data- 

base, and I have transactional replication enabled. 

When I’m monitoring disk activity, I’ve noticed 
that there are a lot of reads, as well as writes, on the 
transaction log. Why is that? 


This is a question we get asked a lot—why is some- 
thing reading from my transaction log? Many times 
I’ve seen it stated that the transaction log is “write- 
only” or “write-mostly,” but that isn’t true. There 
are many processes and features that can cause reads 
of the transaction log, and transactional replication 
is one of them. 

Transactional replication works by having a SQL 
Server Agent job periodically read the publication 
database’s transaction log on the Publisher to find 
committed changes to tables in the various publica- 
tions you’ve defined. The changes are put into the 
distribution database on the Distributor for subsequent 
pushing or pulling to the subscription database(s) on the 
Subscriber(s). This process generates some of the read 
activity you're seeing on your transaction log. (You 
can get a 200-level overview of transactional replica- 
tion and other high-availability features in my MSDN 
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Comparing executed stored procedure calls 


My Transaction 


white paper “High Availability with SQL Server 2008” 
(msdn.microsoft.com/en-us/library/ee523927.aspx). 

The most common cause of read activity on the 
transaction log is a backup operation. Transaction 
log backups (obviously) read all the transaction log 
records generated since the last transaction log backup 
completed. Full and differential backups must also 
read some transaction log records to guarantee that 
the restored database is transactionally consistent. 

One feature that you might not expect to cause 
transaction log reads is a database snapshot (one you 
create manually or the one that’s created automati- 
cally when you run a DBCC CHECK* command). 
When a database snapshot is created, crash recovery 
is run on the database that the snapshot is based on, 
but the recovery is run into the database snapshot, 
making it a transactionally consistent, point-in-time 
view of the source database. 

There are quite a few other things that can 
cause transaction log reads. See my recent blog post 
“What can cause log reads and other transaction log 
questions” (www.sqlskills.com/BLOGS/PAUL/post/ 
What-can-cause-log-reads-and-other-transaction-log- 
questions.aspx) for more information. SQL 
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HOW YOUR 
INFORMATION 
GOES TO WORK 


zarn more at www.EMC.com/sal. 


en years ago, Microsoft revolutionized 
i business intelligence (BI) by making OLAP 
accessible to the IT professional right in the 
SQL Server relational database product. Now the 
company wants to make BI available throughout 
an organization. What’s next for BI? Tom Casey, 
the Microsoft general manager for SQL Server BI, 
met with SQL Server Magazine’s Technical Director 
Michael Otey and Executive Editor Sheila Molnar to 
discuss Microsoft’s BI strategy—the history of BI, its 
vital role in the SQL Server 2008 R2 release, and its 
future. For an in-depth look at the SQL Server 2008 R2 
release, watch for our SQL Server 2008 R2 new features 
article in the June issue of SQL Server Magazine. 


SQL Server Magazine: When we spoke with Donald 
Farmer in 2009, we asked him about his background, and 
we'd like to start there with you as well. Did you move 
from being a relational database guy to a BI guy? 


I was a file server database guy for a 
long time. I worked on embedded systems, and then 
I transitioned to relational databases prior to coming 
to Microsoft to work on SQL Server 7.0. I focused 
on our relational technologies, on replication, and 
on distributed computing. Working on file systems 
and embedded systems early on, I got the notion 
of connection to the end user that’s critical for BI. 
So when the opportunity came to work on making 
accumulated relational data [available] to more users, 
which is a big part of BI, I jumped at it. It’s an exciting 
convergence. 


SQL Server Magazine: Can you tell us more about 
your role as the leader of the cross-group BI effort at 
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Microsoft? Can you describe how this virtual team 
works and how that effort is progressing? 


We focus on making BI pervasive and available 
to everyone. We want users to get value out of the data 
that they already have, but we also want to turn that 
data into broadly available information that informs 
daily business productivity. We don’t want to deliver 
a suite that requires specialized training and tools for 
BI; we want to deliver something that’s ingrained in 
the tools that people use every day. Not just the tools 
that the information worker uses in Office, but also 
SharePoint as the infrastructure that the IT profes- 
sional relies upon, and SQL Server as a mission-critical 
platform for the whole thing. We don’t want specializa- 
tion in the stack because it gets in the way of users 
consuming and getting value out of what they need. 

At Microsoft, we’ve organized ourselves so that we 
don’t deliver BI in just one team but we do it in a virtual 
team comprised of leaders and developers in Office, 
SharePoint, and SQL Server. We drive a holistic virtual 
engineering team. First, we identified key leaders— 
general managers in the Access, Excel, and SharePoint 
teams; me in SQL Server; a marketing counterpart and 
a distinguished engineer, Amir Netz. We focus on BI as 
a whole for Microsoft, giving teams guiding principles 
on the necessary priorities. We work closely to make sure 
that what we're delivering is consistent and compelling. 
There are end-to-end scenarios or experiences that guide 
what we're doing in the SQL Server 2008 R2 release. We 
communicate those things broadly and make sure that 
we're adhering to those principles. We're shipping Office 
2010 at the same time that we're shipping SQL Server 
2008 R2. It’s the first time in 10 years that we’ve shipped 
those two products together. 
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“The SQL Server 2008 

R2 PowerPivot add-in for 
Excel lets users simply start 
working in Excel, an envi- 


MICROSOFT?’S BI STRATEGY 


SQL Server Magazine: What’s 
Microsoft’s strategic thinking 
behind BI in SQL Server 2008 
R2? 


Casey: The strategic initiative 
here is to make sure we serve 
the needs of all the roles in the 
organization. The consumers of 
BI, the IT professionals that put 
the systems and infrastructure 
together, and the developers and 
analysts who build the solu- 
tions that people consume. We 
deliver it in the right place in the 
stack so that you have familiar 
tools and the right tools. We're 
aligning release cycles and devel- 
opment efforts to make sure that 
BI comes through as a set of 
holistic experiences from Microsoft. It is a key thing that 
will differentiate us from what other organizations do. 


SQL Server Magazine: How does an organization get 
started with BI? Do they have to become OLAP experts? 


Casey: There’s a much 
faster path available for 
them. Requiring the 
whole organization—the 
IT professional, the 
developer, and the end 
user—to become experts 
in a tool that’s special- 
ized for an OLAP engine 


ronment that they're familiar Sn inhibitor. The SQL 


with. We've masked the fact 
that there’s a very powerful 
OLAP engine underneath.’ 
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Server 2008 R2 Power- 
Pivot add-in for Excel 
lets users simply start 
working in Excel, an 
environment that they’re 
familiar with. We’ve 
masked the fact that 
there’s a very powerful OLAP engine underneath. 
People will start with experiences that are ingrained 
in their daily work, and they won’t have that abrupt 
sense of making a transition to becoming a BI devel- 
oper or BI user. 

Making this seamless is key to our information 
platform vision. The second thing that provides a 
faster path is reporting. In the past, reporting has been 
separate but related: You build your application, and 
then you build your reports. You never got any leverage 
between the two. By driving to a common model, user- 
accessible tools, and a shared platform infrastructure 
for reporting and OLAP, we made it possible to build 
a BI solution and generate a report off of it with the 


new Report Builder. And vice versa—you can take a 
report that’s off of relational data and turn it into a 
data feed that lets you do rich analysis. 


SQL Server Magazine: We've seen a huge uptick with 
our readers using SQL Server Reporting Services. The 
same thing could work with PowerPivot for Excel. 


Casey: They can’t be separate and isolated experi- 
ences. You have the appropriate tools—Report Builder 
to design and create reports and present informa- 
tion broadly to users, and Excel to design solutions 
building and analysis. They need to be complementary 
and interoperate well. 


SOL Server Magazine: How is Microsoft’s vision for 
BI different from that of its major competitors? 


Casey: Our vision 1s about breadth of adoption and sup- 
porting higher-end utilization. Where we differ from some 
competitors is that we'll rely on our rich partner ecosystem 
to build and deliver the specialized solutions that people 
want in vertical markets. That’s very different from other 
vendors who focus on specialized tools and applications. 
We focus on breadth: We want to make it possible to 
drive BI from 10 to 20 percent of users in an organization 
gaining value from their BI tools toward something that’s 
in the range of 40, 60, or 80 percent of people. 


SOL Server Magazine: That opens up opportunities 
for smaller companies that want to get in this game. 


Casey: SOL Mag readers, IT pros, as well as devel- 
opers benefit from organizations that standardize 
on the BI platform as a part of their infrastructure. 
That’s part of what we’re driving at with SQL Server 
and SharePoint. We want to make BI part of the 
infrastructure and a set of services that are available 
for any app you build and consume in an organization. 
By optimizing for that, we'll allow that ecosystem for 
our partners and custom application builders. 


SQL Server Magazine: Can this approach to BI pro- 
mote career growth throughout an organization? 


Casey: I absolutely think it’s a career-growth oppor- 
tunity for IT professionals. For more than 10 years, 
we've been asking the SQL Server community to 
develop expertise beyond the relational engine. Now 
we deliver on the vision that BI is really part of the 
infrastructure that enables new capacity and new 
classes of apps, and the expertise that they’ve gained 
by cross-training themselves suddenly allows them to 
enable new classes of agility and decision-making. 


SQL Server Magazine: What opportunities are available 
for BI developers? 
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Casey: There are two classes of developers in the BI 
realm. We've got to serve both and merge them. One 
class is the people who build the specialized BI app—the 
traditional budgeting, planning, forecasting, and score- 
carding application. And then there’s the other class, 
where there’s opportunity around data visualization 
and reporting. We've taken steps to enrich the data plat- 
form at the same time as we're building the BI tooling. 
Look at data visualization, charting controls, and gauge 
controls. They're available in the Report Designer for 
the developer who wants to write a report that gener- 
ates RDL that uses those components, but they're also 
available in .NET, and they follow the same common 
.NET programming model. You can access them in 
your project system normally just by referencing those 
assemblies. Developers can take an incremental step, one 
that’s small compared to the value they get from doing 
reporting that’s well-integrated with their .NET applica- 
tions. That provides a foundation where the semantic 
models they're building and the skills they’ve developed 
are transferable to custom or traditional BI applications. 
We're going to keep investing in that. 


SQL Server Magazine: So would those things involve 
Windows Presentation Foundation (WPF) and 
Silverlight? 


Casey: If you want to hint at futures, I think your 
readers would like that. I’m a big believer in Silverlight 
and WPF and carrying those things forward. I think 
you'll see some investments in that direction. 


SQL Server Magazine: How about BI for SharePoint? 


Casey: You and your readership know that SharePoint 
has become a very critical piece of data access, access 
to BI information, and delivery of that information. 
More and more organizations are asking how to 
deliver BI through SharePoint. Last November was 
the first time we had a BI presence at the SharePoint 
Conference. We were mobbed on the show floor. That 
audience has already figured out that reporting is 
integrated and is taking advantage of it. Just as in the 
old days when we used to expect the systems adminis- 
trator and the DBA to work together, the SharePoint 
administrator and the DBA are becoming two critical 
pieces that fit together and overlap very well. 


SQL Server Magazine: You're saying that many of 
your inquiries are related to SharePoint? 


Casey: Some people see SharePoint as a way to get to 
data. The platform underneath that stores data, allows 
you to access it, and provides enrichment services, like 
analytics, is SQL Server. What we've done in SQL Server 
2008 R2, and what we'll continue to do, is drive an inte- 
grated set of experiences. I want the IT pro to know that 
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SQL Server is key, and that Share- 
Point is key to the BI story. 


SQL Server Magazine: Many 
readers are interested in surfacing 
management dashboards through 
SharePoint for SQL Server BI 
information and reports. 


Casey: That’s where it starts. 
With PowerPivot integration, 
not just with Excel but with 
PowerPivot for SharePoint, we 
give the IT administrator the 
ability to gain visibility into and 
exert some control over those 
reporting applications. We make 
those manageable through the 
SharePoint environment with 
the PowerPivot add-in for Share- 
Point. We make it possible for the IT admin to under- 
stand how data is being reused in the organization and 
create a cycle for turning data into a new information 
asset that is published back to SharePoint, which is 
implicitly backed by SQL Server Analysis Services. 


SQL Server Magazine: Can you explain what Master 
Data Services is all about? 


Casey: The addition of Master Data Services to the 
integration portfolio that underlies BI is brand-new with 
SQL Server 2008 R2, and it’s getting a lot of good initial 
buzz and traction from our user community. One of the 
most common things organizations run into when they 
try to get consistent reporting done over a period of time 
is the fact that hierarchies, dimensions, and perspectives 
change. For example, say you’ve acquired another com- 
pany that looked at the sales regions slightly differently 
than you do. That creates a mismatch and a boundary 
that you can’t report across. Master Data Services lets 
you map and create a master view that defines the hierar- 
chies you want to project and work with going forward. 

You have the model that you want to present to the 
organization, and you map that to one or many dif- 
ferent models underneath. That lets you look at more 
dimensions and more attributes in a very consistent 
way. You can create an infrastructure using Master 
Data Services that gives you a consistent view of not 
just the schema that you want to present (the hierar- 
chies you care about) but also the data that’s relevant 
for those things. We're providing the infrastructure 
that’s necessary for people to build master data man- 
agement solutions, which are almost always custom to 
each organization. And hence we talk about Master 
Data Services just like we talk about providing SQL 
Server Reporting Services, which lets people build 
reports. 
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SQL Server Magazine: Would 
you call this a data integration 
product? 


Casey: It’s part of SQL Server. 
It’s another service in the box 
that’s very closely related to 
SQL Server Integration Services 
[SSIS]. In fact, it uses SSIS 
for data movement. It’s closely 
related but it’s a discrete and sep- 
arate service. We’ll continue to 
add more services as it’s relevant 
to enrich the data platform. 


SOL Server Magazine: Let’s dis- 
cuss the Parallel Data Warehouse. 
Do you see any challenges with 
getting people to buy an appli- 
ance along with the software? 


Casey: I see it as an opportunity. We entered the ref- 
erence architectures business in the SQL Server 2005 
release where we announced partnerships with HP, 
Dell, and others. We refreshed those partnerships with 
the SQL Server 2008 release, and we’ve announced 
even more people working with SQL Server for the 
reference architectures or appliance-like offerings on 
our Fast Track Data Warehouse solutions. Those 
solutions are very popular with customers—the com- 
plexity of getting high-scale systems matched between 
processing hardware, memory, and disk layouts, along 
with properly configuring the software in it to run on 
those things, takes time for IT professionals. It’s a big 
up-front cost that carries most of its cost one time. 

The partners are optimizing themselves and 
working with us to optimize SQL Server running in 
those configurations. Those vendors provide dedicated 
solutions that just work out of the box, so you’re up 
and running quickly. Where parallel data warehousing 
comes in is the ability to scale out. To take an incre- 
mental step to expand, you add another appliance-like 
node into the mesh. It extends and adds your ability to 
scale out as well as scale up, which SQL Server already 
had. Over the last couple of years, we’ve worked 
closely with our partners around the total package 
solution and how that lands in the data center. This 
lets you get better TCO 
in that total package. 
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SQL Server Magazine: Is 
the Parallel Data Ware- 
house targeted at a cer- 
tain size of organization? 
And when a customer 
has support needs, to 
whom do they look? 


Casey: Both great questions. In terms of the size of the 
organization: It’s less about size in terms of people or even 
revenues, and it’s much more about data. You can have 
a fairly small data clearinghouse that has huge volumes 
of data. Look at web analytics companies. SQL Server 
scales up very well today, as evidenced by what folks like 
PREMIER Bankcard are doing with teens of terabytes 
of data deployed in their environment currently via 
scale-up with SQL Server 2008. Into the tens of terabytes, 
we do alright. But if you’re in the many tens or into the 
hundreds of terabytes or a petabyte of information, those 
cases are where parallel data warehousing is the preferred 
solution. You can use a Fast Track solution as a spoke 
or a node in your Parallel Data Warehouse topology as 
well in order to provide some domain-specific focusing 
of data that you want to deliver for a particular depart- 
ment’s usage. We’ve been very deliberate about taking 
that acquisition and normalizing it on Windows and 
SQL Server so that it follows the same semantic and has 
many familiar manageability characteristics for our users. 
The roadmap for moving forward on both scale up and 
scale out is very clear, and we continue to articulate it to 
customers and partners. 

In terms of whom they call: We ship the software 
and we stand behind that software. And we’ve set up 
the programs very carefully so that when they call us 
they’re getting the right class of escalation that will 
support their solution and any handoff that needs to 
be done in either direction, regardless of where the 
call originated. Those handoffs with partners are as 
seamless as we can make them. It’s about enterprise 
commitment. 


SQL Server Magazine: You mentioned the Fast Track 
Data Warehouse; would you describe it for us? 


Casey: Fast Track Data Warehouse is a moniker 
around a set of reference architectures. They range 
from the single terabyte offering all the way into the 
many tens of terabyte offerings; 48 terabytes being 
the largest. They’re available from multiple different 
vendors: HP, Dell, and IBM. 


SQL Server Magazine: Do the vendors have different 
packages? How do the customers buy them? 


Casey: With the Parallel Data Warehouse, as it has 
been with Fast Track, we're offering choice because we 
know some of our customers have preferred vendors. 
They’ve standardized on them, and we want them to 
be able to continue to do that. If you’re an HP shop, 
continue to be an HP shop. If you’re a Dell shop, 
continue to be a Dell shop. If youre an IBM 
shop, continue to be an IBM shop. We don’t want to 
offer a whole separate solution. That’s why we don’t 
roll out a Microsoft-branded and stamped piece of 
hardware. We know people have good relationships; 
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we want them to use those relationships and offer 
them the right capability that’s optimized. 


SOL Server Magazine: Shifting gears a bit: Can you 
give us your perspective on using BI with the SQL 
Azure platform? 


Casey: We recently made some announcements 
about Windows Azure and SQL Azure: Starting in 
January, SQL Azure is broadly available for public 
deployments. Our roadmap says that we’ll continue 
to deliver additional platform services in that data 
platform tier under SQL Azure. The first offering is 
focused on the relational capabilities; subsequently 
we'll add reporting and analytics. We’ll add BI in a 
way that it surfaces in the right places. So you may see 
end-user experiences surfacing through Office online, 
and the back end of that being served out of analytics 
running in SQL Azure. 


SQL Server Magazine: So will these announcements 
be forthcoming this year? 


Casey: This year, we're focused around what’s coming 
with the compelling set of releases in the first half of 
the year: Office 2010 and SQL Server 2008 R2. We’ll 
begin talking about the roadmap for the next wave of 
things that we do in the cloud. In BI, most organi- 
zations are still struggling with getting their data 
together and making it usable inside the firewall. 

We're just seeing SQL Azure take off—the 
ability for someone to provision a database in the 
cloud and be up and running. Suddenly it’s there, 
and you don’t have to publish it to anybody else 
and move it around. It’s available from anywhere 
from any device. And it’s hugely powerful. Now 
bring that to your reports. Now bring that to your 
analytics. It’s just a natural evolution. 


SQL Server Magazine: It’s a new year and a new 
decade for BI. Based on the past 10 years, can you 
venture a few predictions about where BI will be 
at the end of the next decade? 


Casey: We were the first vendor to integrate OLAP 
Services right next to the data store. We did that 
with SQL Server 7.0, and we still have the leading 
OLAP server on the market. Good things have 
happened: There’s been consolidation of tools 
and consolidation in the market. But BI is still 
only reaching 10 to 20 percent of users in an orga- 
nization. I’m pleased, but I’m not at all satisfied. 
The future’s going to be about delivering not just 
capabilities and applications to an organization, 
but rich, compelling experiences. I'll be greatly 
disappointed if we end this new decade without 
BI at least having doubled in terms of its value to 
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an organization and the number of users it reaches to 
take advantage of it. I won't be surprised if people are 
accessing BI through any application that’s delivered 
anywhere, whether it’s on the cloud, on premises, or 
on a device. And they won’t even know that it’s BI. I 
think 10 years from now we'll be talking about BI as a 
natural part of business productivity and application 
usage. Today, we're still talking about a separate thing 
called BI. You need it to become part of the infra- 
structure of IT and part of the implicit IP [intellectual 
property] of information workers. It needs to become 
how businesses are agile and productive. 

I think we're on the cusp of a major change right 
now. It’s a pivotal point. It harkens back to your question 
about SharePoint earlier, where people have accepted 
applications and information in their organization as 
coming through a portal and not littered across a bunch 
of file shares or discrete applications. It means that 
there’s a place where users connect to users and users 
connect to data. Now we can drive a different kind of 
transformation—from where BI was something separate 
toward where it is truly integrated in applications. And 
that’s exciting to me. That’s what our mission really is. 


SQL Server Magazine: It looks like we have another 
exciting 10 years to look forward to! SQL 
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Aggregates Against 
Partitioned Tables 


An efficient workaround provides improved performance 


en requesting a MAX or MIN aggregate 
of a column in a nonpartitioned table, 
if you have an index on the aggregated 


column, the optimizer uses a plan that scans only one 
row in the index’s leaf. Consider the following query: 


SELECT MAX(col2) FROM dbo.T1; 


Suppose that T1 is a nonpartitioned table and that 
you created a nonpartitioned index on T1(col2). The 
optimizer will generate a plan that scans the leaf of the 
index from the tail backwards and stops after one row. 
Similarly, if you ask for MIN(col2), the optimizer will 
generate a plan that scans the leaf of the index from the 
head forward and stops after one row. In both cases the 
plan is extremely efficient and involves very little work. 

Partitioned tables and indexes are different; unless 
you apply the aggregate to the partitioning column, 
you'll get an inefficient plan in both SQL Server 2008 
and SQL Server 2005. In this article I explain how MAX 
and MIN aggregates against partitioned tables are opti- 
mized, and I provide a better-performing workaround. 


Sample Data and Test 
Environment 

Run the code in Listing 1 to generate the sample data 
that I use in my example. Note that this script will 
take a few minutes to run. The first part of the code 
in Listing 1 creates a database called TestMinMax. 
The second part creates a helper table function called 
GetNums, which returns a sequence of integers in the 
range | through @n, where @n is an integer input. The 
third part creates a partitioned table called T1, with the 
partitioning column being coll, and populates it with 
10,000,000 rows in five partitions. The code creates a 
clustered index on coll, partitioned the same as the 
table by coll, as well as a nonclustered index on col2, 
also partitioned the same as the table by coll. 

My test machine has an Intel Core 17 processor (quad 
core with hyperthreading—total of eight logical proces- 
sors), 4GB of RAM at 1333MHz, and one 7200RPM 
hard drive. The execution plans and performance 
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measures that I present are for queries against SQL Server 
2008 SP1 but are similar for SQL Server 2005 SP3. 


Querying the Partitioned Table 
If you ask for a MIN or MAX aggregate of the parti- 
tioning column (coll in our case), you get an efficient plan 
that scans only one row in the first or last nonempty parti- 
tion. As an example, consider the following query (call it 
Query 1) and its execution plan, which Figure 1 shows. 


SELECT MAX(col1) AS mx 
FROM dbo.T1; 


If you examine the properties of the Clustered Index 
Scan operator, you'll notice that only one partition of 
the index idx_coll is accessed (partition 5), and only 
one row is scanned in that index. I got the following I/O 
and time statistics for this query on my test machine: 


I/0: Table 'T1'. Scan count 1, logical reads 3 
Time: CPU time = Ø ms, elapsed time = 8% ms. 


This plan is extremely efficient. 

The results are different if you ask for a MIN or 
MAX of a column that isn’t the partitioning column. 
For example, there’s an index on col2 that’s parti- 
tioned by coll—the same way the table is partitioned. 
Consider the following query (call it Query 2) and its 
execution plan, which Figure 2 shows: 


SELECT MAX(col2) AS mx 
FROM dbo.T1; 


Note in the execution plan that the leaf levels of all par- 
titions of the index idx_col2 are scanned in full—a total 
of 10,000,000 rows. I got the following performance 
statistics for this query on my test machine: 


I/0: Table 'T1'. Scan count 11, logical reads 13798 
Time: CPU time = 3183 ms, elapsed time = 2739 ms. 


This plan is inefficient; it scans 10,000,000 rows. 
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A better strategy is to calculate a local, or partial, 
MAX(col2) aggregate within each partition by scanning 
only one row from the tail of the local index leaf, then 
calculate the global, or final, aggregate on top. Note 
that SQL Server uses a similar strategy when parallel- 
izing aggregate calculations in nonpartitioned cases but 
doesn’t use such logic with partitioned tables. 

To work around this problem, add a filter to the 
aggregate query that restricts it to only one partition, 
using the following form: 


WHERE $PARTITION.<partition_function>(<partitioning_ 
column>) = <partition_number> 


LISTING I: Code to Create Sample Data 


-- Part 1: Create sample database TestMinMax 


SET NOCOUNT ON; 
USE master; 


IF DB_ID('TestMinMax') IS NOT NULL DROP DATABASE TestMinMax; 


CREATE DATABASE TestMinMax 


GO 
USE TestMinMax; 
(e0) 


-- Part 2: Create helper function GetNums which generates a sequence of integers 
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums; 


GO 


CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE 


AS 
RETURN 
WITH 


LØ AS(SELECT 1 AS c UNION ALL SELECT 1), 

L1 AS(SELECT 1 AS c FROM LØ AS A CROSS JOIN LØ AS B), 
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), 
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), 
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), 
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), 


Nums ASC(SELECT ROW_NUMBER() OVERCORDER BY (SELECT NULL)) AS n FROM L5) 


SELECT TOP (@n) n FROM Nums ORDER BY n; 
GO 


-- Part 3: Create and populate partitioned table T1 
CREATE PARTITION FUNCTION PF1 CINT) 


AS RANGE LEFT FOR VALUES (2800000, 4000000, 6QOO000, BØØODDO) ; 


CREATE PARTITION SCHEME PS1 
AS PARTITION PF1 ALL TO ([PRIMARY]) ; 


CREATE TABLE dbo.T1 
¢ 

coll INT NOT NULL, 

col2 INT NOT NULL, 

filler BINARY(2@9) NOT NULL DEFAULT(@x@1) 
) ON PS1(col11); 


CREATE UNIQUE CLUSTERED INDEX idx_coll ON dbo.T1(col1) ON PS1(col1); 
CREATE NONCLUSTERED INDEX idx_col2 ON dbo.T1(col2) ON PS1(col1); 


INSERT INTO dbo.T1 WITH (TABLOCK) (coll, col2) 


SELECT n, CHECKSUM(NEWID()) FROM dbo.GetNums (10900009) ; 


Query 1: Query cost (relative to the batch): 100% 
SELECT MAX(coll) AS mx FROM dbo.T1; 


p= 
| i E] 
SELECT Stream Aggregate Top Clustered Index Scan (Clu... 
Cost: 0% (Aggregate) cost: 0% [T1]. [idx_col1] 
psa Cost: 0 % va Cost: 100 % 
Figure | 


Execution Plan for Query | 


In this case, the optimizer produces an efficient plan 
that scans only one row from the edge of the local index. 
Consider the following query (call it Query 3) and its 
execution plan, which Figure 3 shows: 


SELECT MAX(col2) AS pmx 
FROM dbo.T1 
WHERE $PARTITION.PF1(col1) = 1; 


Only one partition (partition 1) of the index idx_col2 
is accessed. The Index Scan operator scans idx_col2 
from the tail backwards, and the Top operator stops 
the scan after one row. This strategy is efficient. I got 
the following performance measures for this query: 


I/0: Table 'T1'. Scan count 1, logical reads 3 
Time: CPU time = 16 ms, elapsed time = 56 ms. 


To address the original need to calculate the 
aggregate against the entire table, you can write a 
query against a table that holds all partition num- 
bers (call it P) and use the APPLY operator to apply 
the logic presented in Query 3 to each row from P. 
Then you can apply a global aggregate of all local 
aggregates on top. If the set of partitions involved is 
small and static, you can construct P as a virtual table 
made of constants. In SQL Server 2008 this can be 
achieved with the enhanced VALUES clause like so: 


SELECT MAX(A.pmx) AS mx 
FROM (VALUES(1), (2), (3), (4), (5)) AS P(partition_ 
number) 
CROSS APPLY ( SELECT MAX(T1.col2) AS pmx 
FROM dbo.T1 
WHERE $PARTITION.PF1(T1.col1) = 
P.partition_number ) AS A; 


This query (call it Query 4) generates the execution 
plan in Figure 4. 

The plan implements the strategy that we hoped to 
see originally; the Constant Scan operator scans the 
five constants representing the five partition numbers 
in our table P, and for each, applies the aforementioned 
logic that calculates the local aggregate efficiently with 
only one row scanned in each of the local indexes. 
The second Stream Aggregate operator represents 
the global aggregate on top of the local ones. This 
plan is very efficient. I got the following performance 
measures for this query: 


Query 2: Query cost (relative to the batch): 100% 
SELECT MAX(col2) AS mx FROM dbo.Tl; 


I/0: Table 'T1'. Scan count 5, 
logical reads 15 


bis Time: CPU time = 15 ms, 
elapsed time = 134 ms. 


= pI 
QE 4 tea 
i | DE = ps = 
SELECT Stream Aggregate Parallelism Stream Aggregate Index Scan (NonClustered) 
bste D K (Aggregate) (Gather Streams) (Aggregate) [T1] . [idx_co012) 
n Cost: 0 % Cost: 0% Cost: 10 % Cost: 89 % 
Figure 2 


Execution Plan for Query 2 
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a derived table is a new capability in 
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SQL Server 2008. In SQL Server 2005, usea series of [RETEG"haxcteol21). [pnx] FROM [dbo] [T1] WHERE Spartition. [PF1]C[co11])=01 
UNION ALL operations like so: DE 
2 E 
SELECT Stream Aggregate Top Index Scan (Noncl ustered) 

SELECT MAX(A.pmx) AS mx cost: 0 % cay Cost: 0% "ener oT 
FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL 

SELECT 3 Figure 3 

UNION ALL SELECT 4 UNION ALL SELECT 5) Execution Plan for Query 3 

AS P(partition_number) 

CROSS APPLY ( SELECT MAX(T1.col2) AS pmx I/0: Table 'T1'. Scan count 5, logical reads 15, 
FROM dbo.T1 Table 'sysrowsets'. Scan count 1, logical reads 2 
WHERE $PARTITION.PF1(T1.col1) = Time: CPU time = Ø ms, elapsed time = 118 ms. 

P.partition_number ) AS A; 
You can calculate a MIN aggregate similarly: 

In many cases, the list of partitions isn’t static; instead, 

partitions are added and removed frequently. Instead SELECT MIN(A.pmn) AS mn 

of using a constant list of partitions, you can simply FROM sys.partitions AS P 

query the sys.partitions view dynamically like so: CROSS APPLY ( SELECT MIN(T1.co12) AS pmn 

FROM dbo.T1 
SELECT MAX(A.pmx) AS mx WHERE $PARTITION.PF1(T1.col1) = 
FROM sys.partitions AS P P.partition_number ) AS A 
CROSS APPLY ( SELECT MAX(T1.col2) AS pmx WHERE P.object_id = OBJECT_ID('dbo.T1') 
FROM dbo.T1 AND P.index_id = INDEXPROPERTY( OBJECT_ID 
WHERE $PARTITION.PF1(T1.col1) = ('dbo.T1'), 'idx_col2', 'IndexID' ); 
P.partition_number ) AS A 

WHERE P.object_id = OBJECT_ID('dbo.T1') For Now, the Workaround Works 

AND P.index_id = INDEXPROPERTY( OBJECT_ID In this article I describe an optimization shortcoming 

(‘dbo.T1'), ‘idx_col2', 'IndexID' ); in SQL Server 2008 SP1 and SQL Server 2005 SP3, 
related to MAX and MIN aggregate calculations 
This query (call it Query 5) queries the sys.partitions view, against a partitioned table. With partitioned tables, 
filtering only the partitions associated with the index the optimizer doesn’t use an index on the aggregate 
idx_col2 in the table dbo.T1. The rest is the same asin column efficiently unless the aggregated column 
Query 4. Figure 5 shows Query 5’s execution plan. is also the partitioning column. I provide a work- 
As you can see, this plan is very similar to the around that results an efficient plan. Hopefully, 
one for Query 4, only instead of the Constant Scan Microsoft will enhance SQL Server to address such 
operator, you see a Clustered Index scan of the clus- calculations more efficiently without the need for a 
tered index on the system table sys.sysrowsets. I got workaround. [SQL] 
the following performance measures for this query: InstantDoc ID 103574 
Query 4: Query cost pa! : 
SELECT MAX(A.pmx) AS mx FROM (VALUES(1), C2), e. (4), (5)) AS P(partition_number) CROSS APPLY ( SELECT MAX(Tl.co12) AS pm: 
Stream B ns cops 
(Aggregate) Creer 3pm) constant Sgan 
Stream Aaareaute Index Scan (NonClustered) 
(Aggregate) [T1] . [idx_co12] 
Cost: 0 % Cost: 99 % 
Figure 4 
Execution Plan for Query 4 
Query 5: Query cost (relative to the batc : 100% 
SELECT MAX(A.pmx) AS mx FROM sys.partitions AS P CROSS APPLY ( SELECT MAX(T1.co12) AS pmx FROM dbo.T1l WHERE SPARTITION.PF1(T| 
= j} a 
„siir tiroen Aios pasti beses — Siomai miteac, tin; 
Cost: 0% Cost: 1% Cost: 50 % 
i a 
aoorenste) Top iina EAT E 
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Execution Plan for Query 5 
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Its not easy, but this is your best option to set 


your timeouts 


hen it comes time to change the timeout of 
W a Visual Studio-generated TableAdapter 

Fill method, it would be nice if there were 
an exposed property in the designer to set an upper 
limit on the length of time the query should take. 
Unfortunately, there isn’t, and there isn’t likely to 
be one for many years (2012 at least). So you’re on 
your own when you set CommandTimeout for the 


Fortunately, there’s a 
reasonable way to build on 
this functionality to your 
generated code. 


SELECT and DML queries you’ve constructed to 
build and update your TableAdapter. 

TableAdapter has become a mainstream class 
to access rowsets generated from queries against 
SQL Server and other backend databases. Sadly, 
TableAdapter doesn’t expose the CommandTimeout 
property of any of the generated queries that the Table- 
Adapter manages. Sure, the ConnectionTimeout key is 
provided in the ConnectionString to govern how long 
the application waits to connect, but there’s no key/ 
value pair to set the Command timeout. By default, 
the generated Command objects invoked by the Fill 
method time out in 30 seconds. Yes, that’s long enough 
to return half the information in a database, but when 
working with comprehensive DML operations or 
more complex (or clumsy) queries, it might not be long 
enough. 

There are many cases where I know that 
a query that takes longer than a couple of sec- 
onds isn’t working correctly. Again, I don’t have an 
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opportunity with the Visual Studio-generated code 
to tune this timeout. Fortunately, there’s a reason- 
able way to build on this functionality to your 
generated code—that’s what I’m going to show you 
here. 


Looking Behind the Scenes in 
the Generated Code 

To get the Fill method Command class to time out when 
you want it to, you'll need to build a Partial class to extend 
the TableAdapter functionality. Ordinarily that would be 
easy, but there are a few gotchas that can blow up in your 
face if you're not careful. First, consider that the Table- 
Adapter’s generated code includes a CommandCollection 
class that contains all of the Fill methods for each data 
set it creates. So, if you create a TableAdapter against the 
NorthWind Customers data table (as shown in Figure 1), 


? CustomerlD 
CompanyName 
ContactName 
ContactTitle 
Address 


City 


Region 
PostalCode 
Country 


Phone 
Fax 


'& CustomersTableAdapter [A] 


Figure | 


The Customers TableAdapter 
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a CustomersTableAdapter is created behind the scenes 
along with its own CommandCollection. 

I expect most developers understand the pro- 
cess of creating a TableAdapter—if you need a 
refresher, I discussed how to run the TableAdapter 
Configuration Wizard in “Generate a Clickthrough 
Report using Visual Studio ReportViewer,” 
InstantDoc ID 99192, and a helpful overview is 
available on the MSDN site at bit.ly/9C4SGe. 
In the Customers TableAdapter in Figure 1, the 
CommandCollection has two methods: Fill and 


LISTING I: Executing the TableAdapter 
Fill Method 


Private Sub btnRun_Click(ByVal sender As System.Object, ByVal e As System 
.EventArgs) Handles btnRun.Click 
Try 
Dim taCustomers As New northwndDataSetTableAdapters.CustomersTableAdapter 
Dim tb As New northwndDataSet.CustomersDataTable 
taCustomers.SelectCommandTimeout = 6% 
taCustomers.Fil1(tb) 
DataGridViewl.DataSource = tb 
Catch ex As Exception 
MsgBox(ex.Message) 
End Try 
End Sub 


THE UNIVERSAL APPROACH 


Another approach to setting the CommandTimeout property is to walk 
through the TableAdapter CommandCollection and set the CommandTimeout 
property for each Command in the collection. This shotgun approach means 
you won't miss any underlying Fill methods that exist at the time you wrote 
your partial class, but it will also affect any additional Fill methods you add 
later. The code for this approach is shown in Listing A. 


LISTING A: The AllCommandTimeouts Property 


Public Property Al1CommandTimeouts() As Integer 
Get 
' Return just the first Command's command timeout 
Return (Me.CommandCollection(@) .CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
For i As Integer = Ø To Me.CommandCollection.Length - 1 
If (Me.CommandCollection(i) IsNot Nothing) Then 
DirectCast((Me.CommandCollection(i)), _ 
System.Data.SqlClient.SqlCommand) .CommandTimeout = value 
End If 
Next 
End Set 
End Property 


GetData. The GetData methods aren’t exposed in 
the CommandCollection. 


Generating the 
CommandCollection 

One tricky part of this process is that the Table- 
Adapter CommandCollection isn’t built until the 
Fill method is executed for the first time. This isn’t 
an insurmountable problem. Pll start by writing 
some code (shown in Listing 1) to instantiate the 
TableAdapter and execute the Fill method. 

Before you execute this code, open the .VB (or .CS) 
file generated by the TableAdapter Configuration 
Wizard that implements the TableAdapter class. In 
my example, it’s northwndDataSet.Designer. VB. 
Use Edit, Find and Replace, Quick Find to locate 


Protected ReadOnly Property CommandCollection() 
As Global.System.Data.Sq]Client.Sq1Command() 


In C#, look for 


protected global: :System.Data.SqiClient 
.Sq]Command[] CommandCollection { 


Set a breakpoint in the property Get routine on 
the Return statement and then run the code shown 
in Listing 1. When Visual Studio stops at the 
breakpoint, hover over the Me._Connection (this._ 
commandCollection in C#) object. (The object gets 
written to the CommandCollection on initialization.) 
Click the plus sign to expose the underlying property 
state of the _commandCollection and SqiCommand 
objects it contains, as shown in Figure 2. 

What you're looking at here is the _command- 
Timeout property exposed behind the scenes in the 
generated code—that’s the property you need to set to 
affect the query timeout value and that’s what the Partial 
class I’m about to create will do. A warning: It might be 
tempting, but don’t try to change the generated code. It’s 
an exercise in frustration because this code is overlaid 
whenever the TableAdapter designer is invoked. 


Implementing 


Protected ReadOnly Property CommandCollection() As Global.System.Data.Sq 


Get 
If (Me._commandCollection Is Nothing) Then 
Me.InitCommandCollection 

End If 


 Me._commandCollection | {Length=1} | 


End Get 
End Property 


CommandText 


CommandTimeout 30 
CommandT ype Text {1} 
Connection 
Container 


<Global.System.Diagnostics.Dek _ 
Public Property ClearBeforeFil 


Nothing 


a (0) | {System.Data.SqlClient.SqlCommand} | 
p 4 ~ "SELECT CustomerID, CompanyName, Conti 


{System.Data.SqlClient.SqlConnection} 


Timeout 
Properties ina 
Partial Class 

To create a Partial class 
against the generated 
TableAdapter code, 
you'll need to go to the 
Solution Explorer, right- 
click the TableAdapter 
.xsd file created by the 


TableAdapter Configura- 


Figure 2 
The Customers TableAdapter CommandCollection 
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tion Wizard, and choose 
View Code. This opens a 
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new tab (named after the data source) in the Visual 
Studio code editor that contains the following two 
lines of code: 


Partial Class northwndDataSet 
End Class 


In this example, the TableAdapter is named 
northwndDataSet.xsd. That name is a clue to the 
name of the generated TableAdapters (there could 
be several). This process must be repeated for each 
and every TableAdapter and in each and every 
data set and data source your project uses (another 
argument for an easier way to set this important 
property). 

To build your custom Partial class, follow these 
steps: 

1. Add a Namespace designation as the first 
line of the Partial class file. This should match the 
targeted .XSD file that was generated by Visual 
Studio’s TableAdapter Configuration Wizard. For 
example 


Namespace northwndDataSetTableAdapters 


2. Add an End Namespace designation at the 
end of the file. 

3. Next, start entering the code shown in 
Listing 2. IntelliSense should help fill in the blanks. 
If you're unsure of the name of the generated 
TableAdapter Namespace, open xxx.Designer. VB 
and search for the Namespace definition. Gener- 
ally, all of the files in the project hidden under the 
.xsd file begin with the name you chose followed by 
<your name>DataSetTableAdapters. Of course, if 
you code in C#, you're on your own—this process 
is a lot harder to code. I suggest you convert from 
the Visual Basic example. 


Implementing the DML 
CommandTimeout Properties 
and the SELECT Timeout 
Property 
The code shown in Listing 2 illustrates the Partial class 
needed to implement the four CommandTimeout 
properties. Of course, this assumes you need to imple- 
ment timeout properties for the InsertCommand, 
UpdateCommand, and DeleteCommand commands 
in the first place. Note that these are fairly simple prop- 
erty set/get routines that access the generated DML 
queries (once). That is, regardless of the number of 
Fill methods you create, all DML operations are done 
via these three commands when changes must be made 
to the Data Source table. 

The SelectCommandTimeout property near the 
end of Listing 2 is another matter. In this case, you 
must reference the CommandCollection directly. 
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TABLEADAPTER TIMEOUTS 2 i 


LISTING 2: The Partial Class to Manage 
Command Timeouts 


Namespace northwndDataSetTableAdapters 
Partial Public Class CustomersTableAdapter 
' Expose Command timeouts for Select, Insert, Update and Delete operations 
' as well as a way to set all Command timeouts at once. 
Public Property InsertCommandTimeout() As Integer 
Get 
Return (Me.Adapter.InsertCommand.CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
Me.Adapter.InsertCommand.CommandTimeout = value 
End Set 
End Property 
Public Property UpdateCommandTimeout() As Integer 
Get 
Return (Me.Adapter.UpdateCommand.CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
Me.Adapter.UpdateCommand.CommandTimeout = value 
End Set 
End Property 
Public Property DeleteCommandTimeout() As Integer 
Get 
Return (Me.Adapter.DeleteCommand.CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
Me.Adapter.DeleteCommand.CommandTimeout = value 
End Set 
End Property 
Public Property SelectCommandTimeout() As Integer 
Get 
Return (Me.Adapter.SelectCommand.CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
' Note that Me.Adapter.SelectCommand might not exist 
' Reference to CommandCollection() 
' might trigger creation of CommandCollection 
If Me.CommandCollection(@) Is Nothing Then 
' CommandCollection not created 
' Probably should throw an exception here... 
Else 
" SelectCommand is created from the CommandCollection 
Me.CommandCollection(@) .CommandTimeout = value 
End 
End Set 
End Property 
Public Property AllCommandTimeouts() As Integer 
Get 
" Return just the Select timeout 
Return (Me.CommandCollection(@) .CommandTimeout) 
End Get 
Set(ByVal value As Integer) 
For i As Integer = Ø To Me.CommandCollection.Length - 1 
f (Me.CommandCollection(i) IsNot Nothing) Then 
DirectCast((Me.CommandCollection(i)), _ 
System.Data.Sq]lClient.SqlCommand) .CommandTimeout = value 
End If 
Next 
End Set 
End Property 
End Class 
End Namespace 


If you step through to the underlying code, you'll 
discover that Me.Adapter.SelectCommand doesn’t 
exist—not until CommandCollection is initialized. 
Simply referencing it does the trick. Because you know 
that the first Fill method’s SelectCommand is placed 
in CommandCollection(0), you know where to find 
the SELECT command’s query timeout setting. This 
means you can change the Timeout property for each 
Fill methods you want to alter. It’s a pain, and it’s not 

necessary to change each command in some cases. 
Again, I think it would be a lot better if Microsoft 
made a few changes to TableAdapter to implement this 
correctly via a direct property setting in the designer. 
The approach I’ve shown here is far too expensive and 
delicate to be used without caution. But frankly, it’s all 
we have. SOL} 
InstantDoc ID 103581 
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Efficient 


SQL Server 2008, Part | 


Use sparse columns to help reduce data 


storage requirements 


ven in this era of dirt-cheap terabyte storage 
[= devices, efficient data storage and storage 

management are still prime concerns for both 
DBAs and developers. So it’s particularly vexing that 
in a typical SQL Server database, null data takes up 
storage space—potentially several bytes for each field 
in each row, depending on the data type of the field. 
SQL Server has to record that there is no data in 
some way. 

Sparse columns are a tool Microsoft provides in 
SQL Server 2008 to help minimize data storage space. 
Sparse columns provide an efficient way to store data 
that’s predominantly null because they require no disk 
storage space for the null values. This situation sounds 
great, but there’s a cost: In a field marked as sparse, 
non-null data takes up a bit more storage space than 
if the field weren't sparse—either two or four addi- 
tional bytes, depending on the data type of the field. 
The Microsoft article “Using Sparse Columns” (msdn 
-microsoft.com/en-us/library/cc280604.aspx) has all the 
details, but it boils down to four bytes for most data 
types and either two or four bytes for data-dependent- 
length data types, such as varchar, varbinary, XML, 
and hierarchyid. 

Microsoft recommends using sparse columns for 
fields in a table when the overall storage space sav- 
ings will be 20 to 40 percent. Sparse columns usually 
make sense when there’s a lot of null data in the field, 
which often happens when the database supports an 
object-relational mapping architecture where several 
concrete classes inherit from an abstract class and the 
data is all saved in a single table. In that case, not all of 
the derived concrete classes use all the available fields, 
and you're wasting storage space even with null data. 
(You could implement an object-relational mapping 
support database with multiple tables, but it would be 
more complex and harder to maintain.) 


Restrictions on Sparse Columns 
As you'll see later in this article, making a column 
sparse is as easy as using the SPARSE keyword in a 
CREATE or ALTER TABLE statement. But SQL 
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Server has a number of restrictions on the use of sparse 

columns. The most important restrictions are: 

e A sparse column must be nullable, of course. It 
makes no sense to make a non-null column sparse 
because it can never include null data. 

e You can’t use ROWGUIDCOL or IDENTITY 
properties on the column. 

e The column can’t have a default value. This restric- 
tion makes sense when you consider that a default 
value makes it less likely that you'll have null data in 
the field. 

e You can’t use sparse columns in a user-defined table 
type. 

e If a table has sparse columns, you can’t compress it 
at either the row or page level. 


Those restrictions aren't all that onerous, but there are 
a few other things to consider. You can’t use sparse col- 
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Sparse columns provide an efficient way 
to store data that’s predominantly null 
because they require no disk storage 
space for the null values. This sounds great, 


but there’s a cost. 


umns with all data types; forbidden types are geography, 
geometry, text, ntext, timestamp, image, and user-defined 
data types. You can’t make varbinary(max) fields that 
use FILESTREAM storage sparse. You also can’t mark 
a computed column as sparse, but you can use a sparse 
column in the calculation of a computed column. 

You could face difficulties if you need to change 
the schema of a table that contains sparse columns. 
Changing the schema requires some work space within 
a row, so if the row size is near 4,009 bytes (close to half 
the maximum size), adding or removing a sparse column 
can fail. SQL Server needs to make a copy of the data 
but might not have room to do it. In practice, you can 
save the data to a temporary table, drop and recreate 


April 2010 31 


32 April 2010 


DATA MANAGEMENT 


LISTING I: Creating the Dog Table 


CREATE TABLE Dog 
C 


DogID INT NOT NULL PRIMARY KEY IDENTITY(1,1), 
Name NVARCHAR(2Ø) NOT NULL, 

LastRabies DATE NULL, 

Sleddog BIT NOT NULL, 

Handedness NVARCHAR(5) SPARSE NULL, 
BirthDate DATE SPARSE NULL, 

BirthYear INT SPARSE NULL, 

DeathDate DATE SPARSE NULL, 

[weight] INT SPARSE NULL, 

Leader BIT SPARSE NULL, 

Rescue BIT NULL, 

OnGoingDrugs NVARCHAR(5Ø) SPARSE NULL, 
SpayNeuter BIT NULL 


the table with the sparse columns, then move the data 
back—a bit of a hassle, but not the end of the world. 

A table with sparse columns has a slightly reduced 
maximum row size: 8,018 instead of the usual 8,060. 
SQL Server needs the additional space for a sparse 
vector that it uses to manage the sparse columns in 
the table. Additionally, all fixed-length non-null sparse 
columns have a maximum length of 8,019 bytes. But 
the really good news is that you can have up to 30,000 
sparse columns in a table, so you can get just about as 
carried away as you want with a table schema. 

All of these restrictions are manageable and 
generally don’t severely restrict the use of sparse 
columns. 


Column Sets 
A table with one or more sparse columns can also have 
a column set field. A column set is an alternative way to 
view and work with all of the sparse columns in a table. 
A column set is an untyped XML column that you can 
use to get and set all sparse column values as a set. It’s 
similar to a computed column in that the column set 
isn't physically stored in the table, but you can update the 
values of the sparse columns through the column set. 

A column set simplifies working with sparse col- 
umns in a table because it aggregates the underlying 
relational columns into an XML snippet. Even if your 
table has only a few sparse columns, a column set can 
simplify working with the data. I'll show you how this 
works in part 2 of this article in next month’s issue. 

Like sparse columns, column sets 


essentially an updateable computed column that 
stores no data directly. 


There are other restrictions documented in the Micro- 
soft article “Using Column Sets” (msdn.microsoft 
.com/en-us/library/cc280521.aspx), so be sure to check 
out the details. 


Good Idea or Bad Design? 

Sparse columns can solve certain kinds of problems 
with database design, although they might offend 
database design purists. You should never use sparse 
columns as an excuse for poor database and table 
design. Some uses of sparse columns denormalize 
data, which you should only do with good reason. 
Also, look at the performance and storage changes for 
your data. You might see an improvement in either—or 
you might not—depending on the nature of the data 
and the patterns of use in your applications. 

Always use careful database design and imple- 
mentation. As cool as sparse columns are, they aren’t 
appropriate for every scenario, particularly when you're 
tempted to violate normalization rules in order to cram 
more fields into a table. 


Working with Sparse Columns 

To explore the features of sparse columns, we'll work 
with a table called Dog, which contains information 
about the dogs in my household, stored in the tempdb 
database. I have 31 dogs; most of them are working or 
retired sled dogs here in Fairbanks, Alaska. The code 
in Listing 1 creates the table. 

Some of the dogs are pets, others are working sled 
dogs, and all have various characteristics that I want 
to record. Some columns apply to all dogs; others 
apply to just the working sled dogs. And some of the 
columns apply in theory to all the dogs, but contain 
data for only a few. Table 1 lists the columns and some 
information that forms the basis for deciding whether 
each column is a candidate to be a sparse column. 

As you can see, there are some decisions inherent 
in the design that might be good or bad. But let’s go 


have a few restrictions. Some of the more LISTING 2: Entering Values in the Dog Table 


significant restrictions you're likely to 

face include: 

e You can’t change a column set after it 
exists in the table. Instead, you must 


e You can’t add a column set to a table 
that already has sparse columns. This 
restriction is probably the most severe. 

e You can have only one column set per 


of the sparse columns in that table. 


INSERT INTO Dog ([Name], [LastRabies], [Sleddog], [BirthDate], 
[Weight], [Rescue], [OnGoingDrugs], [SpayNeuter]) 
VALUES ('Mardy', '11/3/2005', Ø, '6/30/1997', 62, 1, 
"Metacam, Phenylpropanolamine, Rehab Forte', 1); 

INSERT INTO Dog ([Name], [LastRabies] 
[Leader], [Rescue], [SpayNeuter]) 

drop it, then re-create it. VALUES C‘Izzi", "11/3/2005", 1, 2801, 8, 1, 1); 

INSERT INTO Dog ([Name], [LastRabies] ] 
[OnGoingDrugs], [SpayNeuter]) 
VALUES ('Jewel', '9/23/2007', Ø, 1, 'Rehab Forte', 1); 

INSERT INTO Dog ([Name], [LastRabies], [Sleddog], [BirthYear], 

[Leader], [Rescue], [SpayNeuter]) 
VALUES ('Casper', '10/17/2007', 1, 2002, 1, 1, 1) 

INSERT INTO Dog ( 
[Weight], [Leader], [Rescue], [SpayNeuter]) 

table; the column set encapsulates all VALUES ('Chance', '9/23/2007', 1, 2002, 36, 1, 1 

INSERT INTO Dog ( ] 
[weight], [Leader], [Rescue], [OnGoingDrugs], [SpayNeuter]) 


, [Sleddog], [BirthYear], 


, [Sleddog], [Rescue], 


[Name], [LastRabies], [Sleddog], [BirthYear], 


, 1); 
[Name], [LastRabies], [Sleddog], [BirthDate], 


e There can be no constraints or default VALUES ('Daikon', "10/17/2087", 1, '2/14/1997", 58, 1, 9, 
"Potassium bromide, DMG', 1); 


values on the column. Remember, it’s 
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with this structure. After the table is created, you can 
populate it using regular INSERT statements. The 
code in Listing 2 inserts the data for several dogs. A 
T-SQL row constructor wouldn’t work because each 
insert must have the same fields. But in this case, not 
all dogs have information for all fields, and it would 


DATA MANAGEMENT 


BEGIN TRANSACTION 
DELETE FROM Dog 
WHERE [Weight] < 48 OR 
DeathDate IS NOT NULL; 
SELECT * FROM Dog; 
ROLLBACK TRANSACTION 


be cumbersome to include all the null values necessary 
to create each row. 

Figure 1 shows the contents of the Dog table 
after you’ve inserted the data. 
Notice that the sparse columns 
have very little non-null data, 
so we mostly made the right 
decisions about which columns 
to make sparse. The one column 
that we might want to reconsider 
is the Leader column, which has 
only two nulls in the six rows of 
data. But it might make sense 
to make it sparse when all the 
dogs are inserted into the table, 
instead of just this sample. 

There’s nothing function- 
ally different about a table that 
includes sparse columns; only 
the storage has changed. All 
the typical INSERT, UPDATE, 
and DELETE statements work 
as with any other table that 
doesn’t have sparse columns, 
and you can wrap operations 


Figure 2 shows the results of executing the three 
previous statements. 


TABLE l: Characteristics of Columns in the Dog Table 


DogiD This is the primary key identity column, which can’t be null. For that reason, it can’t be a sparse 
column. 


LastRabies This date column records the date of the dog’s last rabies shot. All the dogs are current with 
rabies shots except for a couple with specific medical conditions that preclude the shots. Because 
this field has data for most dogs, it shouldn’t be sparse. 


Handedness Sled dogs can have a side preference on the trail, such that they'll push their running mate off 
the trail if they're on the wrong side. This field applies only to sled dogs, and only a few dogs have 
this characteristic, so this column is a good candidate to be a sparse column. 


BirthYear This column is a bit of a denormalization of the table schema. Rather than using bogus month 
and day information in the BirthDate field, | record the year here. This column is a candidate to be 
sparse because it applies to only a few dogs. 
in transactions as usual. For 


example, the following state- as errr ie area 


ments work on the Dog table as Weight | usually keep track of the weight of a dog only if it has some medical condition that makes it hard 
you'd expect: to keep weight on it. This situation applies to only a few dogs, so this column is a sparse column. 


Rescue This column indicates whether the dog was rescued from the animal shelter or another emergency 
situation. | always know this information, so the column isn’t a good candidate to be sparse. 


SpayNeuter All the dogs are spayed or neutered, no exceptions, to keep down the number of dogs dumped at 
the shelter. Only a new dog might not yet have been fixed. | always know this about a dog, so it’s 
not a candidate to be sparse. 


SELECT * FROM Dog WHERE 
Leader = 1; 


SELECT Name, Handedness, Rescue 
FROM Dog WHERE Leader IS 
NOT NULL; 


Even this slightly more compli- 
cated example has the expected 


results: 
EJ Results 
DoglID Name LastRabies Sleddog Handedness  BirthDate BithYear DeathDate Weight Leader Rescue - OnGoinaDruas SpayNeuter 
1 Mardy 2005-11-03 0 NULL 1997-06-30 NULL NULL 62 NULL 1 Metacam, Phenylpro... 1 
2 2 Izzi 2005-11-03 1 NULL NULL 2001 NULL NULL 0 1 NULL 1 
3 zi Jewel 2007-09-23 0 NULL NULL NULL NULL NULL NULL 1 Rehab Forte 1 
4 4 Casper 2007-10-17 1 NULL NULL 2002 NULL NULL 1 1 NULL 1 
5 5 Chance 2007-09-23 1 NULL NULL 2002 NULL 36 1 1 NULL 1 
6 6 Daikon 2007-10-17 1 NULL 1997-02-14 NULL NULL 50 1 0 Potassium bromide, ... 1 
Figure I 


The Dog table with inserted data 


SQL Server Magazine * www.sqlmag.com April 2010 33 


DATA MANAGEMENT 


E Results | 24 Messages 
DogID Name LastRabies Sleddog Handedness  BirthDate BitthYear DeathDate Weight Leader Rescue OnGoingDrugs SpayNeuter 
1 (4 Casper 20091017 1 NULL NULL 2002 NULL NULL 1 1 NULL 1 
2 ~ Chance 2009-09-23 1 NULL NULL 2002 NULL 36 1 1 NULL 1 
2 6 Daikon 2009-10-17 1 NULL 1997-02-14 NULL NULL 50 1 0 Potassium bromide, ... 1 
Handedness Rescue 
1 1 
2 Casper 1 
3 Chance 1 
4 Daikon 0 
LastRabies Sleddog Handedness BirthDate Bithear DeathDate Weight Leader Rescue OnGoingDrugs SpayNeuter 
1 2008-11-03 0 NULL 1997-06-30 NULL NULL 62 HULE | 1 Metacam, Phenylpropan... 1 
2 2008-11-03 1 NULL NULL 2001 NULL NULL 0 1 NULL 1 
3 2009-09-23 0 NULL NULL NULL NULL NULL NULL 1 Rehab Forte 1 
4 2009-10-17 1 NULL NULL 2002 NULL NULL 1 1 NULL 1 
5 2009-10-17 1 NULL 1997-02-14 NULL NULL 50 1 0 Potassium bromide, DMG 1 


Figure 2 


The results of executing statements on the Dog table 


You won't see much, if any, difference in storage 
requirements in this trivially small table. But start adding 
10 or 100 million records and you'll see some significant 
storage savings, particularly if the ratio of non-null to 
null values in the sparse columns stays small. 


Work More Efficiently 
In part 2 of this article, appearing in next month’s 
issue of SOL Server Magazine, Vl explore column 
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fa Results Ba Messages 
DoglD -Name LastRabies ‘Sleddog “Handedness — BirthDate Bith’ear DeathDate Weight Leader Rescue OnG oingDrugs SpayN euter 
1 a ; Casper 2009-10-17 1 NULL NULL 2002 NULL NULL 1 1 NULL 1 
Saa | Chance 2009-09-23 1 NULL NULL 2002 NULL 36 1 1 NULL 1 
2 6 Daikon 2009-10-17 1 NULL 1997-02-14 NULL NULL 50 1 0 Potassium bromide, ... 1 
Handedness Rescue o a 

1 1 
2 Casper 1 
3 Chance 1 
4 Daikon 0 

| LastRabies Sleddog Handedness  BirthDate BitthYear DeathDate Weight Leader Rescue OnGoingDrugs SpayNeuter 
1 2008-11-03 0 NULL 1997-06-30 NULL NULL 62 NULL 1 Metacam, Phenylpropan... 1 
2 2008-11-03 1 NULL NULL 2001 NULL NULL 0 1 NULL 1 
3 2009-09-23 0 NULL NULL NULL NULL NULL NULL 1 Rehab Forte 1 
4 2009-10-17 1 NULL NULL 2002 NULL NULL 1 1 NULL 1 
5 2009-10-17 1 NULL 1997-02-14 NULL NULL 50 1 0 Potassium bromide, DMG 1 

Figure 2 


The results of executing statements on the Dog table 


You won't see much, if any, difference in storage 
requirements in this trivially small table. But start adding 
10 or 100 million records and you'll see some significant 
storage savings, particularly if the ratio of non-null to 
null values in the sparse columns stays small. 
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sets and how you can use them to manage access 
to a large number of sparse columns in a table, 
as well as explore filtered indexes. These indexes 
aren't limited to use with sparse columns; they let 
you create a table index on a subset of table rows. 
Along with sparse columns, filtered indexes are a 
tool in SQL Server 2008 to help you work more 
efficiently with your data. SQL 
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Building a 


SOL Server 


Inventory Report 
Use SSIS and SSRS to build a solution that lists your 


SOL Server instances 


f you have multiple SQL Server instances in your 
environment, it can be difficult to keep the details of 
each system straight. This is where a list, or inven- 
tory, of your SQL Server instances can come in handy. 
A SQL Server inventory stores information about the 
SQL Server instances in your environment. For example, 
you can search your SQL Server inventory to see which 
SQL Server 2005 instances are currently running SP3. 
When it comes to building a SQL Server inven- 
tory, you can either buy a commercial product or 
create your own solution. There are several benefits 
to building your own solution. For instance, you can 
add or remove features as needed. By using SQL 
Server to build your inventory, you acquire the most 
compatibility and you don’t incur extra cost, installa- 
tion, or resource burden. In this article, I'll introduce a 
home-grown method to build a SQL Server inventory 
that can be automatically updated. It requires a SQL 
Server system that acts as the inventory server, which 
we run a SQL Server Integration Services (SSIS) 
package from and publish a SQL Server Reporting 
Services (SSRS) report to. With SQL Server 2008 R2’s 
business intelligence (BI) features, we could build the 
entire solution without much difficulty. 


The XML File that Lists SQL 
Server Instances 

We built a SQL Server 2008 R2 CTP instance on a 
virtualized Windows Server 2003 box with dual-core 
processors and 4GB of memory for the inventory 
server. The SQL Server installation also includes SSIS 
and SSRS. The SQL Server Agent service runs under 
a domain Windows account that has sufficient privi- 
leges on all the target SQL Server instances. 

In order to add SQL Server instances to the inven- 
tory, we need to know where they are. This solution was 
originally designed by leveraging SQL Server's built-in 
SQLCMD command-line utility with the -L option so 
that it could list all the SQL Server instances broad- 
casting on the network without human input. However, 
it turns out we aren’t necessarily interested in every SQL 
Server instance found on our network. A majority of 
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these servers are desktop SQL Server database engines 
and SQL Server Developer Edition instances. Without 
a good naming convention, they're very hard to remove 
from the server list. In addition, it’s very difficult to 
update the inventory with environment and application 
information. In the end, we adopted an XML format for 
listing SQL Server instances that must be updated manu- 
ally. It’s a one-time effort because in most environments, 
SQL Server system names don’t change very often. You 
only need to update the list when a new SQL Server 
instance is installed or an existing instance is removed or 
renamed. The SSIS package will scan this XML file every 
time it starts and run everything else in this package. A 
sample of the source XML file is in Listing 1, accom- 
panied by an .xsd file in Web Listing 1 (www.sqlmag 
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.com, InstantDoc ID 103611). You can modify 
the ServerName, Environment, and AppName 
attributes to match your environment, and 


ORE on the WEB 


See the web figures and listings 
at InstantDoc ID 103611. 


remove or add sections between the row tags. 


Creating and Running 
the SSIS Package 
Information is extracted from the 
target SQL Server systems during 
the extraction, transformation, and 
loading (ETL) process. We’ve built 
an SSIS package using SQL Server 
2008 R2 (CTP) Business Intelli- 
gence Development Studio (BIDS), 
as shown in Figure 1. The .dtsx defi- 
nition of the SSIS package is shown 
in Web Listing 2. You can import 
this package into BIDS to review 
and edit it. You can save it in a file 
directory and build a scheduled 
SQL Server job to point at its path. 
You can also import it into msdb 
and manage it from there. 

Here’s how the package works. 
The SSIS package imports the XML 
file into SQL Server. It then maps 
the list of SQL Server instances to a 


LISTING I: A Sample of the 
Source XML File 


<SQLInventory> 

<row> 
<ServerName>DEVSQL991</ServerName> 
<Envi ronment>Dev</Envi ronment> 
<AppName>Biz</AppName> 

</row> 

<row> 
<ServerName>QASQL923</ServerName> 
<Environment>QA</Envi ronment> 
<AppName>SSRS 2888 Test</AppName> 

</row> 

<row> 
<ServerName>PRODSQL191</ServerName> 
<Envi ronment>Prod</Environment> 
<AppName>Inventory</AppName> 

</row> 

<row> 
<ServerName>StageSQL@15</ServerName> 
<Envi ronment>Staging</Envi ronment> 
<AppName>Accounting</AppName> 

</row> 

<row> 
<ServerName>PRODSQL@15</ServerName> 
<Envi ronment>Prod</Envi ronment> 
<AppName>Accouting</AppName> 

</row> 

<row> 
<ServerName>SandSQL416</ServerName> 
<Envi ronment>Sandbox</Envi ronment> 
<AppName>Order Entry</AppName> 

</row> 

</SQLInventory> 
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The SSIS package in BIDS 


LISTING 2: Script that Creates the 
SQLServerInventory Table 


context of the package. You'll also need to create 
two package-level variables. The VarDynamicCONN 
variable is of the string data type and will be assigned 
dynamic values of target SQL Server connections. The 
VarSQLServerList variable is an object type that will 
be used to accept the ADO Recordset object, which 
contains records of all your SQL Server instances. 
The package first runs the EST-Create local 
master table Execute SQL task. This task creates 
the SQLServerInventory table in the master data- 
base on the inventory server. Listing 2 provides the 
script that builds the SQLServerInventory table. The 
second task that the package runs is the DFT-Jmport 
XML list to local master table Data Flow task. It 
imports a list of SQL Server instance names from 
the XML file, converts data from Unicode to varchar 
through the DC-Convert Unicode Data Conversion 
transformation, then loads the data into the master 
table. This step populates only three columns in the 
table—ServerName, Environment, and AppName. 
The rest of the columns are left NULL. Note in 


memory variable and 
passes it to a Foreach 
Loop container, inside 
of which it’s mapped to 
dynamic SQL Server 
connections. The loop 
reaches out to each 
SQL Server instance 
to pull version infor- 
mation and stores the 
information in each 
SQL Server instance’s 
tempdb database. The 
data subsequently flows 
into the inventory serv- 
ers tempdb database 


-- EST-Create local master table 
-- Script to build a raw base SQL inventory table: 


IF OBJECT_IDC'master.dbo.SQLServerInventory', 'U') IS NOT 
NULL DROP TABLe [master].dbo.SQLServerInventory; 


CREATE TABLE [master].dbo.SQLServerInventory ( 
ServerId INT IDENTITY PRIMARY KEY clustered, 
ServerName VARCHAR(3Q) NULL, 

OSVersion VARCHAR(15) NULL, 

OSSP VARCHAR(5) NULL, 

Memory VARCHAR(1@) NULL, 

CPU VARCHAR(5@) NULL, 

NetBIOSName VARCHAR(2®) NULL, 

MachineName VARCHAR(2@) NULL, 

InstanceName VARCHAR(1@) NULL, 

SQLEdition VARCHAR(5@) NULL, 

SQLVersion VARCHAR(79) NULL, 

SQLSP VARCHAR(15) NULL, 
ResourceLastUpdateDateTime VARCHAR(3@) NULL, 
BuildC]rVersion VARCHAR(2@) NULL, 

Collation VARCHAR(5@) NULL, 

IsClustered bit NULL, 
IsIntegratedSecurityOnly bit NULL, : . 
Environment VARCHAR(1@) NULL, and is updated in the 
AppName VARCHAR(5@) NULL, final table. 

ModifiedDate DATETIME NULL DEFAULT CURRENT_TIMESTAMP : 

Dg Let’s walk through 


the package in more 
detail. First, create a 
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new SSIS project in BIDS. Our naming standard 
requires prefixing tasks with the initials of the task 
name. For example, you would use EST for Execute 
SQL task and DFT for Data Flow task. Once you’ve 
created the SSIS package, you need to create a few 
connections: two OLE DB connection managers 
that point to the master database and tempdb on the 
inventory server, and the DynamicCONN connection 
manager, which serves as a placeholder for dynamic 
connections to target SQL Server instances in the 


the XML Source task, .xml and .xsd files are stored 
at the C:\ root, as shown in Web Figure 1. A Data 
Conversion component converts between XML 
Unicode and varchar data types in the destination 
table. 

Next, the package runs the EST-Create local 
tempdb staging table task that creates the SQLServer- 
InventoryStaging table in the inventory SQL Server 
system’s tempdb. The schema script to create this 
table can be found in Listing 3. Now, the package 
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Figure 2 


A Data Flow task sending a list of SQL Server 
instance names to an ADO in-memory recordset 


will run the DFT-List SQL Servers into memory Data 
Flow task. This task retrieves the SQL Server instance 
names with a SELECT statement as a Data Flow 
source, as the following code shows: 


SELECT ServerName FROM 
[master] . [dbo] . [SQLServerInventory] 
ORDER BY ServerId ASC 


(Note that some lines of code in this article have been 
wrapped.) The Data Flow task populates an ADO 
in-memory recordset (the VarSQLServerList vari- 
able), as shown in Figure 2. 

The core of this package is the FEL-Access each 
SQL instance Foreach Loop container. It works by 
reiterating through a collection (in this case, the ADO 
object source VarSQLServerList variable) and map- 
ping the enumerator value (a SQL Server system name) 
in each loop to the user-defined VarDynamicCONN 
variable, as Web Figure 2 shows. 

To make the dynamic SQL Server connection, the 
updated variable User::VarDynamicCONN is used as 
a Property Expression for the DynamicCONN con- 
nection manager, as shown in Figure 3, thus forcing 
the ServerName property in a connection manager 
to use the dynamic value in our target SQL Server 
instance list. You can see that inside the loop, the 
EST-Build remote tempdb table Execute SQL task 
is able to connect to each target server. (For the 
EST-Build remote tempdb table Execute SQL task 
schema script, see Web Listing 3.) It basically builds 
a staging table in the target SQL Server instance’s 
tempdb database and saves all information in it. 
The next task is a Data Flow task that simply maps 
and transports data columns in the target server’s 


LISTING 3: Script that Creates the 
SQLServerlInventoryStaging Table 


-- EST-Create local tempdb staging table 
-- Create Staging table in tempdb 


IF OBJECT_ID('tempdb.dbo.SQLServerInventoryStaging', 'U') IS NOT NULL 


DROP TABLE tempdb.dbo.SQLServerInventoryStaging; 


CREATE TABLE tempdb.dbo.SQLServerInventoryStaging ( 
ServerName VARCHAR(3@) NOT NULL PRIMARY KEY CLUSTERED, 
OSVersion VARCHAR(15) NULL, 

OSSP VARCHAR(5) NULL, 

Memory VARCHAR(1@) NULL, 

CPU VARCHAR(59) NULL, 

NetBIOSName VARCHAR(29) NULL, 
MachineName VARCHAR(2@) NULL, 
InstanceName VARCHAR(1@) NULL, 
SQLEdition VARCHAR(59) NULL, 
SQLVersion VARCHAR(7%) NULL, 
SQLSP VARCHAR(15) NULL, 
ResourceLastUpdateDateTime VARCHAR(39) NULL, 
BuildC]rVersion VARCHAR(2@) NULL, 
Collation VARCHAR(59) NULL, 
IsClustered BIT NULL, 
IsIntegratedSecurityOnly BIT NULL 


The following is the T-SQL query for the target SQL 
Server instance: 


SELECT * FROM [tempdb] . [dbo] 
. [RemoteSQLServerInventoryStaging] 


The following is the code for the central inventory SQL 
Server system: 


SELECT * FROM [tempdb] . [dbo] 
. [SQLServerInventoryStaging] 


While the Foreach Loop is executing, all target SQL 
Server system information is dumped into the central 
staging table (SQLServerInventoryStaging). By the 
time the loop ends, the staging table has a wealth 
of values for our SQL Server farm. You can modify 
the collection of values returned using the UPDATE 
statement in Listing 4. The last EST- Update base table 
task updates the SQLServerInventory table through 
an inner join from the staging data. 

The SSIS package can be scheduled as a SQL Server 
job, as shown in Figure 4. The job definition is in Web 
Listing 4. Be sure to update the .dtsx file path when 
you check the job script in your 
SQL Server instance. The script 
also assumes that SQL Server 


Property expressions: 
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Database Mail has been set up Property 
properly and an operator called 
DBA can be used for job notifica- 
tion. In my environment, it runs at 
12 a.m. daily, but the schedule and 


frequency can be customized. 


ServerName 
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Expression 


-v| @lUser:VarDynamicCONN] 


Building the SSRS 
Report 

You can write up a simple 
SELECT statement in SQL Server 


Figure 3 


User::VarDynamicCONN used as a Property 


tempdb table to the inventory server’s tempdb table. Management Studio (SSMS) to Expression for DynamicCONN 
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LISTING 4: Script to Update the SQLServerlnventory 


Table 


-- Update master inventory table: 


UPDATE m 


SET m.OSVersion = t.OSVersion, 


m.OSSP = t.OSSP, 
m.Memory = t.Memory, 
«CPU = t. CPU; 


»SQLSP = t.SQLSP, 


.BuildClrVersion = 
-Collation = t.Coll 


EE EE Ee) EE Shee 


m.IsIntegratedSecuri 


.NetBIOSName = t.NetBIOSName, 
-MachineName = t.MachineName, 
.InstanceName = t.InstanceName, 
-SQLEdition = t.SQLEdition, 
-SQLVersion = t.SQLVersion, 


.ResourceLastUpdateDateTime = t.ResourceLastUpdateDateTime, 


t.BuildCirVersion, 
ation, 


.IsClustered = t.IsClustered, 


tyOnly = t.IsIntegratedSecurityOnly, 


m.ModifiedDate = CURRENT_TIMESTAMP 
FROM [master] . [dbo]. [SQLServerInventory] m WITH (NOLOCK) 
JOIN [tempdb] . [dbo]. [SQLServerInventoryStaging] t WITH (NOLOCK) 
ON m.ServerName = t.ServerName 


IBS Job Step Properties - Run SSIS package job 


General 
2 Advanced 
Step name: 


U Hep 


Run SSIS package job 


Type: 


SOL Server Integration Services Package 


Run as: 


[SOL Server Agent Service Account 


| Command line 
_ mmane 


Server: 
bosws06835\sql200812 


Connection: 
AMSA\iding 


5} View connection properties Package: 


General | Configurations | Command files | Data sources | Execution options | Logging | Set values 


Package source: 


[CASSIS\2008\My_ SSIS_2008_Bundle\My_SSIS_2005_Bundle\UpdateSLServerlnventoyy ffinalldtsx | (.. 
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Figure 4 


Creating a SQL Server job that executes the SSIS package 
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check the results of the SQL Server job in a tra- 
ditional way. However, to make the SQL Server 
inventory list more presentable, secure, and easily 
accessible, I’ve built an SSRS report. An elaboration 
on how to build an SSRS report is beyond the scope 
of this article. (For more information about get- 
ting started with SSRS, see “SQL Server Reporting 
Services,” June 2007, InstantDoc ID 95745.) How- 
ever, I'll mention a few key points that might help 
you successfully implement this report. The Report 
Server is a SQL Server 2008 R2 instance that uses 
a Windows domain account to start up the SSRS 
service. When configuring the Web Server URL and 
the Report Manager URL, I chose TCP port number 
8888 because the default port 80 has been used by the 
default SQL Server 2005 SSRS instance. I also chose 
to install the two default reporting temporary databases 
(ReportServer and ReportServerTempDB) on a local 


instance. To confirm the reporting site is up, you'll 
use the Report Manager URL (DBAServer:8888/ 
reports_SQL2008R2) and the Report Web Service 
URL (DBAServer:8888/reportserver_SQL2008R2). 

We're still working in BIDS 2008, but this time 
we'll create a new Report Server project called DBA 
Reports. Then, we'll need to create a shared data source 
called LocalDataSource, which points to our inven- 
tory SQL Server system via Windows authentication 
(integrated security). It’s mapped to the data source 
in Report Data. A data set is created afterward that 
has a dynamic SQL query with @ServerName as its 
parameter, as the following code shows: 


EXEC ('SELECT * FROM [master]. [dbo] 
. [SQLServerInventory] WITH (NOLOCK) 
WHERE ServerName LIKE ''%'+ @ServerName + '%'' 
ORDER BY ServerName ASC') 


This parameter lets you filter results based on the 
LIKE operator. The preview pane shows the report in 
its expected format and with cosmetic effects. 


Deploying the Report 

Now you're ready to deploy the report. In the Solution 
Explorer in BIDS, make sure the project deployment 
configurations have the correct values—TargetData- 
SourceFolder (Data Source by default), TargetReport- 
Folder (DBA Reports), and TargetServerURL (the 
Report Manager URL mentioned previously). The 
publishing process pushes the data source .rds file and 
report .rdl file to the report site. You can deploy the 
LocalDataSource.rds data source file, shown in Web 
Listing 5, and the SQL Server Inventory Report (legal 
paper ).rdl report file, shown in Web Listing 6, using the 
Report Manager’s Upload File button. Alternatively, 
you can deploy these files from within BIDS. Once 
the report has been deployed, you should check it in 
the Report Manager using the Report Manager URL. 
Web Figure 3 shows the SQL Server Inventory Report 
listed in the DBA Reports folder. 

To view the inventory report, click DBA Reports 
and then SQL Server Inventory Report (legal paper). 
By default, the report retrieves all the records in the 
SQLServerInventory table. Note that the NULL 
check box, which is next to the SQL Server Name 
text box, is selected. If you clear the check box, you 
must fill in a string in the SQL Server Name text box 
that forces the report to display results that match the 
string pattern. For example, if you want to see a list 
of only the SQL Server instances with the number 7 
in their names, simply type 7 in the SQL Server Name 
text box. 

Figure 5 shows a complete view of the report. The 
report shows the SQL Server instance name, the last time 
the inventory was updated, the user name that runs this 
report, and the time when the report runs. Almost all the 
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Figure 5 
The SQL Server Inventory Report 


column headers in the report permit sorting. In the menu 
area, there are several standard buttons that let end users 
print, zoom in and out, search, browse pages, and save 
the report in one of the seven available formats (Word, 
Excel, comma-separated value—CSV, XML, HTML, 
PDF, TIFF). For printing convenience, the report has 
been tailored to fit an 8.5 x 14" legal-size paper. In the 
upper right corner of the report, you can insert your 
company’s logo. The report user can also subscribe to 
the report via email or shared directory. To facilitate 
easy viewing, the report Env column is color-coded. The 
Server Name column also turns to violet when the server 
is clustered. 

Now, you'll want to check the configuration of 
the SQL Server Inventory Report in the Report 
Manager. On the Home site, make sure the Security 
option under Folder Settings includes the Win- 
dows groups or user accounts that have access to 
the site. You can add them to the SSRS Browser 
role via SSMS. Also, ensure the Security option 
under Site Settings includes the same accounts in 
the System User role. In the General Properties of 
the LocalDataSource data source, update the con- 
nection string to Data Source=SQLServerName\ 
Instance; Initial Catalog=master. Make sure to 
connect to the data source using stored credentials 
from either a SQL Server standard login account 
or a Windows account. You don’t want to choose 
the Windows integrated security radio button unless 
Kerberos authentication is properly set up in the 
domain’s Active Directory. Finally, check to see 
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if the rsreportserver.config file (located under 
C:\Program Files\Microsoft SQL Server\MSRS10_50 
SQL2008R2\Reporting Services\ReportServer in my 
installation) contains the correct value to ensure the 
Kerberos authentication is set up properly using the 
following code: 


<AuthenticationTypes> 
<RSWindowsNTLM/> 
<!-- not <RSWindowsNegotiate/> 
unless Kerberos is on --> 
</AuthenticationTypes> 


Reliable and Consistent 
Information 
The solution described here is mostly dynamic. 
Important environment changes you can make once 
the inventory job has run include patch updates, 
adding RAM and CPUs, or changing the security 
authentication model on monitored SQL Server 
instances. In the future, I’d like to further improve 
this solution by automating the updating of the SQL 
Server instance XML list. In addition, you could add 
more information to the SQL Server inventory table, 
such as the default SQL Server installation path or 
the SQL Server Cumulative Update number, in the 
future. Nevertheless, this approach saves you time by 
providing reliable and consistent information about 
your SQL Server environment, enabling you to focus 
on critical database issues. SQL 
InstantDoc ID 103611 
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Left-Brain.com is the newly launched online superstore stocked with 
educational, training, and career-development materials focused on 
meeting the needs of SQL Server professionals like you. 


> SOL Server 2002 System Views Featured Product: 
= = SQL Server 2008 System Views Poster 


Face the migration learning curve head on with the SQL Server 
2008 System Views poster. An updated full-color print diagram of 
catalog views, dynamic management views, tables, and objects for 
SOL Server 2008 (including relationship types and object scope), 
this poster is a must-have for every SQL DBA migrating to or al- 
ready working with SQL Server 2008. 


Order your full-size, print copy 
today for only $14.95*! 


*Plus shipping and applicable tax. 


www.left-brain.com [In] iServer 
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Database Security 


Keep your data (and job) safe 


coverage, you don’t want to responsible for a 

security snafu. You can help protect your servers 
using proper configuration and sticking to best prac- 
tices, but the right security software can add a much 
more substantial safety net. 

See the table on the following page for some 
examples of these products. You'll notice that there’s 
a wide variety of products available to protect SQL 
Server. They come in at many different price and 
feature levels, so don’t jump into a purchase without 
doing your research. 


i na time when security failures regularly get news 


The Basics 

Be aware of security problems that are specific to 
certain versions of SQL Server, and check that any 
security products you consider fully support your 
SQL Server version. All the vendors who responded 
to our survey said their products are compatible with 
SQL Server 2008, SQL Server 2005, and SQL Server 
2000, but not all products were compatible with SQL 
Server 2008 R2 yet—so check on that if you plan to 
adopt it right away. 

Carefully check the license terms of any SQL Server 
security product you're considering. There are almost 
as many different license types as there are products in 
our buyer’s guide table. Some products are licensed by 
SQL Server instance, some by the hardware used to run 
SQL Server, and some by the hardware on which you 
install the monitoring tools. Depending on how your 
environment is set up, you could pay different amounts 
for very similar coverage. 

Don’t forget that a secure SQL Server system has 
to run on a secure Windows server. Keep your servers 
properly patched, and don’t neglect network security. 
Be cautious about security measures with the potential 
to hurt your SQL Server system’s performance, how- 
ever, because some products may not consider SQL 
Server's unique requirements. 


Dig Deep 

According to the companies that make them, all of 
the products in the table have minimal system resource 
requirements, but it’s up to you to make sure that 
your environment can take strain imposed by security 
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products. You need to choose between security products 
that run on the SQL Server and those that run on a 
separate machine. Products running on your server use 
resources there, but just about any security solution will 
put an additional load on your servers, whether directly 
or indirectly. 

Of course, all the system resources in the world 
won't help you if you don’t meet your regulatory or 
compliance standards. If your SQL Server systems 
are bound by standards like Sarbanes-Oxley or the 
Health Insurance Portability and Accountability Act, 
make sure a security product will make it easy to meet 
these requirements. If you slack on your research in this 
area, you could find yourself in a very uncomfortable 
situation—no one wants to be audited. 


v 


Zac Wiggy 


(zacwiggy @ penton.com) is an assistant editor 
for Windows IT Pro and SQL Server Magazine. 


Check the license terms of any 


SQL Server security product 
youre considering. There are 


almost as many different license 
types as there are products in our 


buyer’s guide table. 


Unlike with other types of security, such as fire- 
walls, there isn’t really a standard set of features that 
you can expect from a SQL Server 
security product. Many features are 
available on each of the solutions 
listed in our table, but most of the 
products are missing at least one of 
the features. Any given product may 
not manage user access, for example. 
Or it might lack real-time monitoring 
and alerting functions. This lack of 
standardization means the onus on 
administrators is heavier than usual 


D Editors Note 


Information in this buyers 
guide comes from vendor 
representatives and resources 
and is meant to jump-start, 
not replace, your own research. 
Also, some products might 


to understand their environments and 
needs thoroughly. [SQL] 
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have been left out, either as 
an oversight or from lack of 
vendor response. 
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Company 
Name 


dataguise 


800-813-6415 


Idera 
877-464-3372 
www.idera.com 


Sentrigo 
408-970-3300 
www.sentrigo 
.com 


Symantec 
801-995-7798 
800-441-7234 
www.symantec 
.com 


Imperva 
650-345-9000 
866-926-4678 
www.imperva 
.com 
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dgmasker 


Security 
Explorer for 
SQL Server 


Idera SQL 
compliance 
manager 


Idera SQL 
secure 


Repscan 3.0 


Hedgehog 
Enterprise 
315 


Altiris Total 
Management 
Suite 


SecureSphere 
Data Security 
Suite 


DATABASE SECURITY 


Starts at $24,995 per 
year; perpetual and term 
licenses available 


Starts at $999 per SQL 
Server managed, quantity 
discounts available; 
licensed per computer, 
one license can manage 
multiple SQL Server 
instances 


$2,995 per SQL Server 
instance 


$995 per SQL Server 
instance 


Starts at $1,000; priced 
by number of database 
servers to be scanned, 
special consultant 
pricing available 


Licensed based on CPU 
cores in monitored 
databases; complete 
multi-core systems start 
at $5,600 


$220 with Essential 
Maintenance; licensed 
per server monitored 
and/or patched; optimal 
environment is 500- 
1,000 servers 


Stand-alone products 
start at $12,500; 
licensed by database 
throughput; vulnerability 
assessments and user 
rights management 
licensed per database 
instance 


Other 
Platforms 
Supported 


Oracle, DB2, 
MySQL, 
Teradata, 
Sybase, 
Access 


None 


None 


None 


Oracle (8i or 
later) 


Oracle (8i 
or later) 
and Sybase 
(ASE12.5) 


SQL Express 
2005 for 
very small 
environments 


Oracle, 
Sybase, 
DB2 (LUW, 
2/0S and 
DB2/400), 
Informix, 
MySQL, and 
Teradata 


Where Product 
Runs 


Server and/ 
or desktop 
workstation 


Remotely on 
Windows-based 
workstation or 
server or locally 
on SQL Server 


Server 


Server 


Product is a 
Windows client 


Server 


Server 


Combination of 
appliance and 
light host-based 
agents 


Protects Against Injection 
Attacks? 


Provides data masking, which 
can mitigate risks of data loss 
due to injection attacks 


No 


Protects against injection 
attacks by continuously auditing 
changes in data and permissions 
and monitoring for unauthorized 
users 


Continuous monitoring of 
permissions on databases, 
services, and objects that could 
potentially be used to infiltrate or 
attack system 


Scans for over 3,000 known 
vulnerabilities, including 
published and zero-day SQL 
injection attacks, using 
penetration testing techniques, 
also reports on vulnerable code 
within database 


Comprehensive database 
activity monitoring and intrusion 
prevention, includes predefined 
rules that protect against 
numerous attack vectors, 
including SQL injection and buffer 
overflow, local sensor detects the 
signature of attack by monitoring 
memory and alerts in real-time 
or can also be configured to 
terminate the user session, 
signatures updated regularly 


No 


Includes integrated web 
application firewall to block SQL 
injection attacks at the web 
server, an additional layer of 
defense at the database layer 
alerting on or blocking execution 
of SQL injection attacks 


Reports Offered 


Multiple options; reports 
accompanied by graphs and can 
be saved in Excel/PDF format; can 
store data in a central repository for 
use by a business intelligence tool 


Can export databases of 
permissions; additional reporting 
available in a related product 


Reports can be created in the 
Central Management Console, 
viewed in the console, deployed 

to SQL Server Reporting Services, or 
exported to Excel or PDF; offers over 
25 predefined compliance reports 


Reports can be created in the 
Central Management Console, 
viewed in the console, deployed 

to SQL Server Reporting Services, 
or exported to Excel or PDF 


Creates a set of standard 
XML-based reports that can be 
imported easily to other systems 


Pre-built standard reports can 
be viewed from the management 
console dashboard running 

in browser, or be emailed on 

a predetermined frequency; 
additional reports are easy to 
create 


Web-based table format 


Pre-defined reports and reporting 
engine lets user customize 
reports; provided as PDF or CSV, 
including graphs, pie charts, and 
data tables; interactive analysis 
views provide real-time access 
to audit data without creating 
reports 
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DATABASE SECURITY 


Allows 
Creation 
of Custom 
Database 
Security 
Policies? 


Regulatory Compliance 
Standards Supported 


Supports 
Real-Time 
Database 


Supports 
Real-Time 
Alerting? 


Manages 
Access 
Permissions? 


Vulnerability 
Assessment? 


Activity 
Monitoring? 


Finds 
Insecure 
Logins? 


Scans for 
Known 
Best 
Practices? 


Integration with 
Other Management 
Frameworks? 


Masks sensitive data in nonprod- No No Yes No Yes 
uction environments to help 

compliance requirements such 

as PCI, HIPAA, GLBA, PII, and 

Sarbanes-Oxley (SOX) 


Able to implement security required No No No Yes Yes 
by standards such as SOX, HIPAA, 

GLBA, and ITIL 

Helps ensure compliance with Yes Yes No No Yes 


requirements such as SOX, GLBA, 
HIPAA, PCI DSS, and Basel II 


Reports supply information for No Yes Yes Yes Yes 
regulations, including SOA SOX Act 

Section 404 (COSO, CobiT), VISA CISP, 

HIPAA, GLBA, NERC Standard 1200, 

FISMA (NIST 800-53 Draft), Basel Il 

Capital Accord 

(ISO 17799:2000), and PCI DSS 

v 1.1 Payment Card Industry 


Reports are valuable for auditors of No No No No Yes 
HIPAA, SOX, PCI-DSS, SAS-70, GLBA, 

and others 

Wizard-driven templates deploy Yes Yes Yes Yes Yes 


policies for SOX, PCI DSS, SAS 70, 
HIPAA, GLBA, and privacy breach 
notification laws based on industry 
best practices 


No No Yes No No Yes 


Configuration audit and assessment Yes Yes Yes Yes Yes 
policies including PCI-DSS, SOX, 

HIPAA, GLBA, Basel II, DISA-STIG, 

CIS benchmarks, IS027001 and data 

privacy acts; pre-defined reports for 

SOX, PCI-DSS, and HIPAA; integrated 

WAF often used as a compensating 

control for PCI-DSS section 3.4 
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No 


Yes 


No 


Yes 


Yes 


No 


No 


Yes 


No 


No 


Yes 


Yes 


Yes 


Yes 


No 


Yes 


Integrates into Microsoft 
Security Development 
Lifecycle 


No 


Alerts can be picked up 
by most management 
frameworks 


No 


XML reports can be 
integrated into other 
frameworks 


Logging alerts and 
activities in several 
formats, including 
Windows Event Log 


Data import and 
export to and from 
other management 
frameworks; web 
services interfaces for 
product integration 


Integrates with systems, 
including SIEM, and 
ticketing systems, 
including BMC Remedy 
and ArcSight 
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ComponentOne LiveLing 
Boost LINQ performance and improve 


data-binding options 


ea aaa LiveLing promises to boost the 
performance of Microsoft’s Language-Integrated 
Query (LINQ) operations by means of indexing 
LINQ queries. This doesn’t mean it adds indexes to 
your underlying database for LINQ to SQL queries. 
Instead, LiveLinq is persistence-ignorant—meaning 
that the indexes it creates are on your data once it’s been 
pulled into CLR memory. This, in turn, translates into 
increased performance for data-heavy applications, 
with only a minor amount of effort or coding required. 
In fact, at the simplest level, adding indexed query 
functionality is as simple as adding simple extension 
methods to your object collections, as Web Figure 1 
(www.sqimag.com, InstantDoc ID 103603) shows. 

LiveLinq also provides functionality called Live 
Views. Live Views permit the results of a LINQ query 
to remain actively attached to the in-memory data 
source or collection that was queried. This functionality 
is useful for data-binding scenarios within rich appli- 
cations such as Silverlight or Windows Presentation 
Foundation (WPF) applications. However, any object 
that you want to participate in a Live View needs to 
derive from an IndexableObject, so you can code your 
property setters to include calls to the IndexableObject’s 
OnPropertyChanging() and OnPropertyChanged() 
events. And while having to derive from a different 
base class is a downer, seeing a Live View in action as a 
data binding source is very impressive. 

Live Views, coupled with indexing capabilities, 
opens up some intriguing possibilities when it comes 
to scaling applications. For example, if you had an e- 
commerce application with a large product catalog, you 
could let site users just thump the database over and 
over again as they browsed the site. But with LiveLinq, 
you can create a middle tier that pulls in large swaths of 
data, indexes that data, then uses Live View operations 
to keep data updated as needed. You could then query 
this middle-tier object as an in-memory database to 
reduce load on your database and increase scalability. 

The LiveLinq installer deploys a large collec- 
tion of sample applications that work well with the 
accompanying documentation to bring developers 
quickly up to speed on options and operations. My 
only serious criticism of LiveLing is the omission of 
examples showcasing how to work with data pulled 
out of LINQ to SQL and Entity Framework solutions. 
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Instead, the samples favor examples of pulling data out 
of DataSets (which I consider the spawn of the devil). 
It also has good examples of pulling data from XML 
or working with collections of objects. 

In putting LiveLing through the paces with a 
collection of 80,000 objects that I pulled out of my 
database, I noticed that the performance benefits 
stemming from the in-memory indexes placed upon 
DateTime properties on my objects didn’t return 
as large of a performance boost as I saw on other 
data types. Most likely, this was due to some slight 
differences in the DateTime precision of my objects 
versus that of my search criteria, and I would expect 
that a bit more work on my part would remedy this 
problem, as indexes in any solution are susceptible to 
mismatches like this.. I found coding with LiveLinq 
to be intuitive and fully capable of delivering on the 
performance benefits advertised. 

I heartily recommend that anyone looking to 
improve performance, scalability, or leverage rich 
data-binding capabilities pull down a trial copy of 
LiveLing and give it a whirl. SQL 
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COMPONENTONE LIVELIN 


Pros: Fully persistence-ignorant indexing capabilities that can deliver 
massive performance benefits for LINQ operations; Live View functionality 
enables compelling data-binding capabilities for Microsoft Silverlight, WPF, 
and other rich Microsoft .NET applications; can be leveraged to take signifi- 
cant load off of databases to increase application scalability 


Cons: Although the sample applications that ship with LiveLing are excellent, 
it’s missing sample projects centered around LINQ to SQL and the Entity 
Framework; Live View functionality requires objects to derive from an 
IndexableObject, which impinges on persistence ignorance and might conflict 


with other developer requirements or needs 


Price: $1,300 per license 


Rating: KKKA 


Recommendation: LiveLing has some limitations and won't be immediately 
beneficial to every kind of application or solution. However, it provides some com- 
pelling capabilities, performance benefits, and scalability options that shouldn't be 
ignored. Organizations looking to improve LINQ performance, bolster scalability, 

or increase data-binding capabilities for rich applications should definitely give 


LiveLing a test drive. 


Contact: 800-858-2739 e 412-681-4343 © www.componentone.com 
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SOVYV Assistants 


Your guid ed resources 


Microsoft SQL Server 
Conference & Expo 

Bellagio, Las Vegas, April 12-14. This exclusive SQL 
training event is brought to you by the people 
who bring you Microsoft SOL Server Connections. 
Last call to join the experts and learn how to get 
the most from the system that you have now with 
an eye toward what's coming with SQL Server 
2008 R2. 

Co-located with Microsoft Visual Studio Launch 
and Microsoft ASPnet & Silverlight conference & 
expo: 
devconnections.com 


Choose the Right Storage 
Architecture for your SQL 
Server Environment 

Choosing the right mix of features to enable in SQL 
Server, while keeping an eye on user demands and 
the company checkbook, can prove a daunting task. 
And choosing the appropriate storage architecture 
can impact critical SLAs for performance, scalability, 
and mission-critical data and service availability. 
Wendy Henry discusses how choosing the right 
storage for your environment does not have to be a 
confusing process. 

sqlmag.com/go/CriticalSQL 


Protecting SQL Server Data 
How protected is your mission-critical data? Are 
you ready for any kind of outage? If not, what 
will the cost be to your business, and can your 
business (and your resume) afford it? Join SOL 
Server expert, consultant, and author Michael 
K. Campbell for an independent overview of key 
Considerations for protecting SQL Server data 
within your organization. 
Sqlinag.com/go/ProtectSQL 


Follow us on Twitter at www.twi 
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SQL Server Power- 
Pivot: My Thoughts 


his blog post was well overdue for me personally. I did blog 

about “The Top 5 Reasons Your Company Should Consider 
SQL Server PowerPivot” (sqlmag.com, InstantDoc ID 103487), 
but that was all I said. I’ve been using SQL Server PowerPivot 
off and on for a few months now, been provided some great 
help by the product teams/fellow MVPs, and have been reading 
others’ thoughts and technical content on the product even longer. 
Finally, this blog post is not an official product review—merely 
my personal thoughts on PowerPivot and its related topics. 

I have one basic premise for this post: PowerPivot is going 
to be a long-term win for Microsoft. The product will not be a 
direct, immediate, quick win. Rather, the product will slowly but 
continuously increase in business value, grabbing higher levels of 
customer adoption. It will significantly contribute to the Microsoft 
BI platform’s overall market positioning and share as time passes. 
“BI for the Masses” continues to advance onwards. 

Which brings me to my second point: PowerPivot will increase 
the utility and demand for corporate BI. Perhaps you are thinking, 
“Anyone could say that and be right.” You’d be correct (assuming 
the product does well), but let me 


Derek Comingore 


(dcomingore @ bivoyage.com) is a principal 
architect with Bl Voyage, a Microsoft Partner 
that specializes in business intelligence services 
and solutions. He's a SQL Server MVP and holds 
several Microsoft certifications. 


explain my reasoning. Self-service 


business intelligence (SSBI) is still | have one basic premise 


fairly new. Most enterprises have 
not yet adopted SSBI solutions, 
as they are still working at getting 
corporate BI right. In the next ten 
years, as an industry, we will see the 
beginning of mainstream adoption 
of SSBI solutions. While it’s easy to 
say, “XYZ Company is doing SSBI today,” look no further than 
the other 30 companies who are not doing SSBI for proof that 
SSBL is still niche. 

Enterprises will continue to prioritize corporate BI solutions 
over SSBI ones. SSBI products provide their highest levels of 
potential value when used in conjunction with an established 
corporate BI vision and associated solutions. SSBI is but one part 
of the larger puzzle; it is not the entire puzzle. 

So how does PowerPivot stack up in the overall SSBI land- 
scape? I’m not going to tell you that PowerPivot is the perfect 
SSBI product. What I will tell you is that for a first version SSBI 
product, PowerPivot is really good. One feature Microsoft nailed 
was making PowerPivot Managed Self-Service, and the product 
encourages managed usage. 

However, what the product team does with PowerPivot for 
the next version and beyond is what counts most. At this point, 
I am confident that customers are taking notice and will begin to 
consider the product’s long-term adoption. There are some “pickle” 
scenarios still to be figured out, but Ihave confidence in the product 
teams—after all, just look at SQL Server 2008. Take a weekend and 
learn SQL Server PowerPivot; it will be time well spent. SQL 
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for this post: PowerPivot is 
going to be a long-term win 
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Free Downloadable Pocket 
Guides—each eBook a $15 value! 


g Business intelligence 

w Configuring and Troubleshooting DNS 

gw Data Warehousing 

W Group Policy 

B Integrating Outlook & SharePoint 

w Outlook Tips & Techniques 

m PowerShell 101 A 12-month print subscription 

to Windows \T Pro, the leading 
Free Archived On-Demand independent voice in the 

eLearning Events—each eventa$79 IT industry 


value! Coverage includes Exchange, 


SharePoint, PowerShell, SQL Server, vıp CD with over 25,000 
and more solution-packed articles 

(updated and delivered 
1 year of VIP access to online 2x a year) 


solution database - with every article 
ever printed in Windows \T Pro and sat 
Server Magazine, PLUS bonus web 
content posted every day on hot topics 

like Security, Exchange, Scripting, 


SharePoint, and more! 


IGHIS 


€ dowsitpro.com/go/MI9"' 


o Editor’s Tip 


DATABASE ADMINISTRATION 

Devart Improves Backup Comparison and Schema Versioning Tool 

Devart has released Schema Compare for SQL Server 2.0, a tool to help analyze and synchronize 
schema differences. New features in the latest version include: support for native SQL Server 
backups, database comparison via the command line, change tracking using version control 
systems, and report generation. dbForge Schema Compare for SQL Server starts at $149.95. 
To learn more or download the trial version, visit www.devart.com 


BACKUP AND RECOVERY 
Flash Drive-Size SQL Server 
Compression 

SoftSkills GmbH has released dbHero Pack ’n 


Got a great 
new product? 


Send announce- N 


sqlmag.com. 
—Brian Reinholz, 
production editor 


OR 


Go, a tool that lets you create compressed and 
encrypted snapshots of a SQL Server database 
and save it on a flash drive. dbHero Pack ‘n Go 
is a free companion product to dbHero, a SQL 
Server backup and restore program. According 
to the vendor, dbHero lets you back up your 
SQL Server database 2,000 percent faster than 
with traditional SQL scripts. You can purchase 
dbHero for $246.99 and download Pack ‘n Go 
for free at www.code-hero.com. 


Reimport Snapshot 
Restore Database to initial state 
SnapShot 
FullBackup 
Take a Snapshot 


for the moment 
DEV..TPCOESQL2005, 
Adventure Works 


DATABASE ADMINISTRATION 


C Diete 7) (ios ) 


CED Please select a previously saved project. or create a new one. Press F1 for heip. 


Red Gate Releases SQL Search 
Red Gate Software has released SQL Search, a plug-in for SQL Server Management Studio that 
allows users to search their database schemas to locate any term in stored procedures, functions, or 
views. According to the vendor, SQL Search allows DBAs to more efficiently label and organize—for 
example, you can label all stored procedures that aren’t finished with “TODO,” then search for that 
text string when you're ready to complete them. SQL Search is available as a free download at www 
ted-gate.com. 


WEB DEVELOPMENT 


Build Customizable Websites in Minutes 
Xlinesoft.com has announced PHPRunner 5.2, 


Š Carscars - Windows Internet Explorer 
Ow [IE hitp:/Nocalhost/pr3/carscars_list;php 


We Favorites [Z] Carscars 


~[B]4[x]/P 


1 M~- GD ~~ Pager Safety Tools Qr ” 


a tool that lets you build Web 2.0 interfaces for 
local or remote SQL Server, MySQL, Access, 
and Oracle databases. New features include: 


Print this page 


Print all pages 


Details found: 2 Page 1 of 1 


Advanced search Import 


Records Per Page: 20 v 


Inline Add || Edit selected 


Delete selected | 


Print selected 


enhanced search capabilities, record locking, 
Audit Table, and Google Maps widget support. 
The Audit Table feature logs actions such as 
login and additions/subtractions to the data- 
base. PHPRunner also restricts unauthorized 
access through CAPTCHA and by blocking 
access after three failed login attempts. New 
charts in the program include a financial chart, 
bubble chart, and gauge chart. PHPRunner is 
compatible with Windows 2000 and later, and 
pricing starts at $399. To learn more, visit www 


Audi 


mT Passenger 
Cars 


BMW 525i 


Make Model Category Color Price 
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for Windows IT Pro and SQL Server 
Magazine and author of Microsoft SQL Server 
2008 New Features (Osborne/McGraw-Hill). 


Fa 7 SQL Server 2008 


White Papers 


hen you need to dig deeper and understand 

the new SQL Server 2008 features, you'll 
find that some of the best resources are Microsoft 
white papers. Microsoft white papers are tech- 
nical guides that are written by SQL Server team 
members, MVPs, and other SQL Server experts. 
They are particularly valuable because they can 
cover a topic more thoroughly than the docu- 
mentation. Many times they offer guidance and 
best practices as well as practical implementation 
steps and advice. For SQL Server 2008, Microsoft 
has created a collection of 44 white papers. Many 
of these white papers can only be viewed online, 
but the larger ones can be downloaded. You can 
find these SQL Server 2008 white papers and 
more at http://msdn.microsoft.com/en-us/library/ 
dd631807(SQL.10).aspx. 


Reporting Services 
SharePoint Integration 
Troubleshooting 

Because it uses SQL Server as its data store, 
SharePoint is one of the biggest driving forces 
behind new SQL Server implementations. 
SharePoint can also be integrated with Reporting 
Services, so users can run reports from SharePoint 
sites. This white paper guides you through the set 
up and discusses common errors and their solu- 
tions. tinyurl.com/ylkts4u 


FILESTREAM Storage in SQL 
Server 2008 

The new SQL Server 2008 FILESTREAM data 
type lets you combine the data management and 
integrity of SQL Server’s relational database 
engine with the performance of accessing LOB 
storage from the native file system. This white 
paper explains SQL Server’s different LOB storage 
options and shows some great performance 
comparisons that help you to see when you should 
use the FILESTREAM data type. tinyurl.com/ 
ccgpt3 


Partitioned Table and Index 
Strategies Using SQL Server 2008 
Partitioning tables and indexes can be a great way to 
improve query performance—especially for very large 
tables. This 65-page downloadable white paper provides 
an overview of SQL Server 2008's table partitioning 
feature and shows how to use table partitioning to 
address the sliding window scenario and data archiving. 
tinyurl.com/cm4k9e 


Plan Caching in SQL Server 2008 
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